Re: [ceph-users] Weird mount issue (Ubuntu 18.04, Ceph 14.2.5 & 14.2.6)

2020-01-17 Thread Jeff Layton
On Fri, 2020-01-17 at 17:10 +0100, Ilya Dryomov wrote:
> On Fri, Jan 17, 2020 at 2:21 AM Aaron  wrote:
> > No worries, can definitely do that.
> > 
> > Cheers
> > Aaron
> > 
> > On Thu, Jan 16, 2020 at 8:08 PM Jeff Layton  wrote:
> > > On Thu, 2020-01-16 at 18:42 -0500, Jeff Layton wrote:
> > > > On Wed, 2020-01-15 at 08:05 -0500, Aaron wrote:
> > > > > Seeing a weird mount issue.  Some info:
> > > > > 
> > > > > No LSB modules are available.
> > > > > Distributor ID: Ubuntu
> > > > > Description: Ubuntu 18.04.3 LTS
> > > > > Release: 18.04
> > > > > Codename: bionic
> > > > > 
> > > > > Ubuntu 18.04.3 with kerne 4.15.0-74-generic
> > > > > Ceph 14.2.5 & 14.2.6
> > > > > 
> > > > > With ceph-common, ceph-base, etc installed:
> > > > > 
> > > > > ceph/stable,now 14.2.6-1bionic amd64 [installed]
> > > > > ceph-base/stable,now 14.2.6-1bionic amd64 [installed]
> > > > > ceph-common/stable,now 14.2.6-1bionic amd64 [installed,automatic]
> > > > > ceph-mds/stable,now 14.2.6-1bionic amd64 [installed]
> > > > > ceph-mgr/stable,now 14.2.6-1bionic amd64 [installed,automatic]
> > > > > ceph-mgr-dashboard/stable,stable,now 14.2.6-1bionic all [installed]
> > > > > ceph-mon/stable,now 14.2.6-1bionic amd64 [installed]
> > > > > ceph-osd/stable,now 14.2.6-1bionic amd64 [installed]
> > > > > libcephfs2/stable,now 14.2.6-1bionic amd64 [installed,automatic]
> > > > > python-ceph-argparse/stable,stable,now 14.2.6-1bionic all 
> > > > > [installed,automatic]
> > > > > python-cephfs/stable,now 14.2.6-1bionic amd64 [installed,automatic]
> > > > > 
> > > > > I create a user via get-or-create cmd, and I have a users/secret now.
> > > > > When I try to mount on these Ubuntu nodes,
> > > > > 
> > > > > The mount cmd I run for testing is:
> > > > > sudo mount -t ceph -o
> > > > > name=user-20c5338c-34db-11ea-b27a-de7033e905f6,secret=AQC6dhpeyczkDxAAhRcr7oERUY4BcD2NCUkuNg==
> > > > > 10.10.10.10:6789:/work/20c5332d-34db-11ea-b27a-de7033e905f6 /tmp/test
> > > > > 
> > > > > I get the error:
> > > > > couldn't finalize options: -34
> > > > > 
> > > > > From some tracking down, it's part of the get_secret_option() in
> > > > > common/secrets.c and the Linux System Error:
> > > > > 
> > > > > #define ERANGE  34  /* Math result not representable */
> > > > > 
> > > > > Now the weird part...when I remove all the above libs above, the mount
> > > > > command works. I know that there are ceph.ko modules in the Ubuntu
> > > > > filesystems DIR, and that Ubuntu comes with some understanding of how
> > > > > to mount a cephfs system.  So, that explains how it can mount
> > > > > cephfs...but, what I don't understand is why I'm getting that -34
> > > > > error with the 14.2.5 and 14.2.6 libs installed. I didn't have this
> > > > > issue with 14.2.3 or 14.2.4.
> > > > 
> > > > This sounds like a regression in mount.ceph, probably due to something
> > > > that went in for v14.2.5. I can reproduce the problem on Fedora, and I
> > > > think it has something to do with the very long username you're using.
> > > > 
> > > > I'll take a closer look and let you know. Stay tuned.
> > > > 
> > > 
> > > I think I see the issue. The SECRET_OPTION_BUFSIZE is just too small for
> > > your use case. We need to make that a little larger than the largest
> > > name= parameter can be. Prior to v14.2.5, it was ~1000 bytes, but I made
> > > it smaller in that set thinking that was too large. Mea culpa.
> > > 
> > > The problem is determining how big that size can be. AFAICT EntityName
> > > is basically a std::string in the ceph code, which can be an arbitrary
> > > size (up to 4g or so).
> 
> It's just that you made SECRET_OPTION_BUFSIZE account precisely for
> "secret=", but it can also be "key=".
> 
> I don't think there is much of a problem.  Defining it back to ~1000 is
> guaranteed to work.  Or we could remove it and just compute the size of
> secret_option exactly the same way as get_secret_option() does it:
> 
>   strlen(cmi->cmi_secret) + strlen(cmi->cmi_name) + 7 + 1
> 

Yeah, it's not hard to do a simple fix like that, but I opted to rework
the code to just safe_cat the secret option string(s) directly into the 
options buffer.

That eliminates some extra copies of this info and the need for an
arbitrary limit altogether. It also removes a chunk of code that doesn't
really need to be in the common lib.

See:

https://github.com/ceph/ceph/pull/32706

Aaron, if you have a way to build and test this, it'd be good if you
could confirm that it fixes the problem for you.
-- 
Jeff Layton 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Weird mount issue (Ubuntu 18.04, Ceph 14.2.5 & 14.2.6)

2020-01-17 Thread Jeff Layton
Actually, scratch that. I went ahead and opened this:

https://tracker.ceph.com/issues/43649

Feel free to watch that one for updates.

On Fri, 2020-01-17 at 07:43 -0500, Jeff Layton wrote:
> No problem. Can you let me know the tracker bug number once you've
> opened it?
> 
> Thanks,
> Jeff
> 
> On Thu, 2020-01-16 at 20:24 -0500, Aaron wrote:
> > This debugging started because the ceph-provisioner from k8s was making 
> > those users...but what we found was doing something similar by hand caused 
> > the same issue. Just surprised no one else using k8s and ceph backed 
> > PVC/PVs  ran into this issue. 
> > 
> > Thanks again for all your help!
> > 
> > Cheers
> > Aaron
> > 
> > On Thu, Jan 16, 2020 at 8:21 PM Aaron  wrote:
> > > No worries, can definitely do that. 
> > > 
> > > Cheers
> > > Aaron
> > > 
> > > On Thu, Jan 16, 2020 at 8:08 PM Jeff Layton  wrote:
> > > > On Thu, 2020-01-16 at 18:42 -0500, Jeff Layton wrote:
> > > > > On Wed, 2020-01-15 at 08:05 -0500, Aaron wrote:
> > > > > > Seeing a weird mount issue.  Some info:
> > > > > > 
> > > > > > No LSB modules are available.
> > > > > > Distributor ID: Ubuntu
> > > > > > Description: Ubuntu 18.04.3 LTS
> > > > > > Release: 18.04
> > > > > > Codename: bionic
> > > > > > 
> > > > > > Ubuntu 18.04.3 with kerne 4.15.0-74-generic
> > > > > > Ceph 14.2.5 & 14.2.6
> > > > > > 
> > > > > > With ceph-common, ceph-base, etc installed:
> > > > > > 
> > > > > > ceph/stable,now 14.2.6-1bionic amd64 [installed]
> > > > > > ceph-base/stable,now 14.2.6-1bionic amd64 [installed]
> > > > > > ceph-common/stable,now 14.2.6-1bionic amd64 [installed,automatic]
> > > > > > ceph-mds/stable,now 14.2.6-1bionic amd64 [installed]
> > > > > > ceph-mgr/stable,now 14.2.6-1bionic amd64 [installed,automatic]
> > > > > > ceph-mgr-dashboard/stable,stable,now 14.2.6-1bionic all [installed]
> > > > > > ceph-mon/stable,now 14.2.6-1bionic amd64 [installed]
> > > > > > ceph-osd/stable,now 14.2.6-1bionic amd64 [installed]
> > > > > > libcephfs2/stable,now 14.2.6-1bionic amd64 [installed,automatic]
> > > > > > python-ceph-argparse/stable,stable,now 14.2.6-1bionic all 
> > > > > > [installed,automatic]
> > > > > > python-cephfs/stable,now 14.2.6-1bionic amd64 [installed,automatic]
> > > > > > 
> > > > > > I create a user via get-or-create cmd, and I have a users/secret 
> > > > > > now.
> > > > > > When I try to mount on these Ubuntu nodes,
> > > > > > 
> > > > > > The mount cmd I run for testing is:
> > > > > > sudo mount -t ceph -o
> > > > > > name=user-20c5338c-34db-11ea-b27a-de7033e905f6,secret=AQC6dhpeyczkDxAAhRcr7oERUY4BcD2NCUkuNg==
> > > > > > 10.10.10.10:6789:/work/20c5332d-34db-11ea-b27a-de7033e905f6 
> > > > > > /tmp/test
> > > > > > 
> > > > > > I get the error:
> > > > > > couldn't finalize options: -34
> > > > > > 
> > > > > > From some tracking down, it's part of the get_secret_option() in
> > > > > > common/secrets.c and the Linux System Error:
> > > > > > 
> > > > > > #define ERANGE  34  /* Math result not representable */
> > > > > > 
> > > > > > Now the weird part...when I remove all the above libs above, the 
> > > > > > mount
> > > > > > command works. I know that there are ceph.ko modules in the Ubuntu
> > > > > > filesystems DIR, and that Ubuntu comes with some understanding of 
> > > > > > how
> > > > > > to mount a cephfs system.  So, that explains how it can mount
> > > > > > cephfs...but, what I don't understand is why I'm getting that -34
> > > > > > error with the 14.2.5 and 14.2.6 libs installed. I didn't have this
> > > > > > issue with 14.2.3 or 14.2.4.
> > > > > 
> > > > > This sounds like a regression in mount.ceph, probably due to something
> > > > > that went in for v14.2.5. I can reproduce the problem on Fedora, and I
> > > > > think it has something to do with the very long username you're using.
> > > > > 
> > > > > I'll take a closer look and let you know. Stay tuned.
> > > > > 
> > > > 
> > > > I think I see the issue. The SECRET_OPTION_BUFSIZE is just too small for
> > > > your use case. We need to make that a little larger than the largest
> > > > name= parameter can be. Prior to v14.2.5, it was ~1000 bytes, but I made
> > > > it smaller in that set thinking that was too large. Mea culpa.
> > > > 
> > > > The problem is determining how big that size can be. AFAICT EntityName
> > > > is basically a std::string in the ceph code, which can be an arbitrary
> > > > size (up to 4g or so).
> > > > 
> > > > Aaron, would you mind opening a bug for this at tracker.ceph.com? We
> > > > should be able to get it fixed up, once I do a bit more research to
> > > > figure out how big to make this buffer.

-- 
Jeff Layton 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS meltdown fallout: mds assert failure, kernel oopses

2019-08-15 Thread Jeff Layton
On Thu, 2019-08-15 at 16:45 +0900, Hector Martin wrote:
> On 15/08/2019 03.40, Jeff Layton wrote:
> > On Wed, 2019-08-14 at 19:29 +0200, Ilya Dryomov wrote:
> > > Jeff, the oops seems to be a NULL dereference in ceph_lock_message().
> > > Please take a look.
> > > 
> > 
> > (sorry for duplicate mail -- the other one ended up in moderation)
> > 
> > Thanks Ilya,
> > 
> > That function is pretty straightforward. We don't do a whole lot of
> > pointer chasing in there, so I'm a little unclear on where this would
> > have crashed. Right offhand, that kernel is probably missing
> > 1b52931ca9b5b87 (ceph: remove duplicated filelock ref increase), but
> > that seems unlikely to result in an oops.
> > 
> > Hector, if you have the debuginfo for this kernel installed on one of
> > these machines, could you run gdb against the ceph.ko module and then
> > do:
> > 
> >   gdb> list *(ceph_lock_message+0x212)
> > 
> > That may give me a better hint as to what went wrong.
> 
> This is what I get:
> 
> (gdb)  list *(ceph_lock_message+0x212)
> 0xd782 is in ceph_lock_message 
> (/build/linux-hwe-B83fOS/linux-hwe-4.18.0/fs/ceph/locks.c:116).
> 111 req->r_wait_for_completion = 
> ceph_lock_wait_for_completion;
> 112
> 113 err = ceph_mdsc_do_request(mdsc, inode, req);
> 114
> 115 if (operation == CEPH_MDS_OP_GETFILELOCK) {
> 116 fl->fl_pid = 
> -le64_to_cpu(req->r_reply_info.filelock_reply->pid);
> 117 if (CEPH_LOCK_SHARED == 
> req->r_reply_info.filelock_reply->type)
> 118 fl->fl_type = F_RDLCK;
> 119 else if (CEPH_LOCK_EXCL == 
> req->r_reply_info.filelock_reply->type)
> 120 fl->fl_type = F_WRLCK;
> 
> Disasm:
> 
> 0xd77b <+523>:   mov0x250(%rbx),%rdx
> 0xd782 <+530>:   mov0x20(%rdx),%rdx
> 0xd786 <+534>:   neg%edx
> 0xd788 <+536>:   mov%edx,0x48(%r15)
> 
> That means req->r_reply_info.filelock_reply was NULL.
> 
> 

Many thanks, Hector. Would you mind opening a bug against the kernel
client at https://tracker.ceph.com ? That's better than doing this via
email and we'll want to make sure we keep track of this.  Did you say
that this was reproducible?

Now...

Note that we don't actually check whether ceph_mdsc_do_request returned
success before we start dereferencing there. I suspect that function
returned an error, and the pointer was left zeroed out.

Probably, we just need to turn that if statement into:

if (!err && operation == CEPH_MDS_OP_GETFILELOCK) {

I'll queue up a patch.

Thanks for the report!
-- 
Jeff Layton 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS meltdown fallout: mds assert failure, kernel oopses

2019-08-14 Thread Jeff Layton
On Wed, 2019-08-14 at 19:29 +0200, Ilya Dryomov wrote:
> On Tue, Aug 13, 2019 at 1:06 PM Hector Martin  wrote:
> > I just had a minor CephFS meltdown caused by underprovisioned RAM on the
> > MDS servers. This is a CephFS with two ranks; I manually failed over the
> > first rank and the new MDS server ran out of RAM in the rejoin phase
> > (ceph-mds didn't get OOM-killed, but I think things slowed down enough
> > due to swapping out that something timed out). This happened 4 times,
> > with the rank bouncing between two MDS servers, until I brought up an
> > MDS on a bigger machine.
> > 
> > The new MDS managed to become active, but then crashed with an assert:
> > 
> > 2019-08-13 16:03:37.346 7fd4578b2700  1 mds.0.1164 clientreplay_done
> > 2019-08-13 16:03:37.502 7fd45e2a7700  1 mds.mon02 Updating MDS map to
> > version 1239 from mon.1
> > 2019-08-13 16:03:37.502 7fd45e2a7700  1 mds.0.1164 handle_mds_map i am
> > now mds.0.1164
> > 2019-08-13 16:03:37.502 7fd45e2a7700  1 mds.0.1164 handle_mds_map state
> > change up:clientreplay --> up:active
> > 2019-08-13 16:03:37.502 7fd45e2a7700  1 mds.0.1164 active_start
> > 2019-08-13 16:03:37.690 7fd45e2a7700  1 mds.0.1164 cluster recovered.
> > 2019-08-13 16:03:45.130 7fd45e2a7700  1 mds.mon02 Updating MDS map to
> > version 1240 from mon.1
> > 2019-08-13 16:03:46.162 7fd45e2a7700  1 mds.mon02 Updating MDS map to
> > version 1241 from mon.1
> > 2019-08-13 16:03:50.286 7fd4578b2700 -1
> > /build/ceph-13.2.6/src/mds/MDCache.cc: In function 'void
> > MDCache::remove_inode(CInode*)' thread 7fd4578b2700 time 2019-08-13
> > 16:03:50.279463
> > /build/ceph-13.2.6/src/mds/MDCache.cc: 361: FAILED
> > assert(o->get_num_ref() == 0)
> > 
> >   ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
> > (stable)
> >   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x14e) [0x7fd46650eb5e]
> >   2: (()+0x2c4cb7) [0x7fd46650ecb7]
> >   3: (MDCache::remove_inode(CInode*)+0x59d) [0x55f423d6992d]
> >   4: (StrayManager::_purge_stray_logged(CDentry*, unsigned long,
> > LogSegment*)+0x1f2) [0x55f423dc7192]
> >   5: (MDSIOContextBase::complete(int)+0x11d) [0x55f423ed42bd]
> >   6: (MDSLogContextBase::complete(int)+0x40) [0x55f423ed4430]
> >   7: (Finisher::finisher_thread_entry()+0x135) [0x7fd46650d0a5]
> >   8: (()+0x76db) [0x7fd465dc26db]
> >   9: (clone()+0x3f) [0x7fd464fa888f]
> > 
> > Thankfully this didn't happen on a subsequent attempt, and I got the
> > filesystem happy again.
> > 
> > At this point, of the 4 kernel clients actively using the filesystem, 3
> > had gone into a strange state (can't SSH in, partial service). Here is a
> > kernel log from one of the hosts (the other two were similar):
> > https://mrcn.st/p/ezrhr1qR
> > 
> > After playing some service failover games and hard rebooting the three
> > affected client boxes everything seems to be fine. The remaining FS
> > client box had no kernel errors (other than blocked task warnings and
> > cephfs talking about reconnections and such) and seems to be fine.
> > 
> > I can't find these errors anywhere, so I'm guessing they're not known bugs?
> 
> Jeff, the oops seems to be a NULL dereference in ceph_lock_message().
> Please take a look.
> 

(sorry for duplicate mail -- the other one ended up in moderation)

Thanks Ilya,

That function is pretty straightforward. We don't do a whole lot of
pointer chasing in there, so I'm a little unclear on where this would
have crashed. Right offhand, that kernel is probably missing
1b52931ca9b5b87 (ceph: remove duplicated filelock ref increase), but
that seems unlikely to result in an oops.

Hector, if you have the debuginfo for this kernel installed on one of
these machines, could you run gdb against the ceph.ko module and then
do:

 gdb> list *(ceph_lock_message+0x212)

That may give me a better hint as to what went wrong.

Thanks,
-- 
Jeff Layton 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph nfs ganesha exports

2019-08-01 Thread Jeff Layton
On Sun, 2019-07-28 at 18:20 +, Lee Norvall wrote:
> Update to this I found that you cannot create a 2nd files system as yet and 
> it is still experimental.  So I went down this route:
> 
> Added a pool to the existing cephfs and then setfattr -n ceph.dir.layout.pool 
> -v SSD-NFS /mnt/cephfs/ssdnfs/ from a ceph-fuse client.
> 
> I then nfs mounted from another box. I can see the files and dir etc from the 
> nfs client but my issue now is that I do not have permission to write, create 
> dir etc.  The same goes for the default setup after running the ansible 
> playbook even when setting export to no_root_squash.  I am missing a chain of 
> permission?  ganesha-nfs is using admin userid, is this the same as the 
> client.admin or is this a user I need to create?  Any info appreciated.
> 
> Ceph is on CentOS 7 and SELinux is currently off as well.
> 
> Copy of the ganesha conf below.  Is secType correct or is it missing 
> something?
> 
> RADOS_URLS {
>ceph_conf = '/etc/ceph/ceph.conf';
>userid = "admin";
> }
> %url rados://cephfs_data/ganesha-export-index
> 
> NFSv4 {
> RecoveryBackend = 'rados_kv';
> }

I your earlier email, you mentioned that you had more than one NFS
server, but rados_kv is not safe in a multi-server configuration. The
servers will be competing to store recovery information in the same
objects, and won't honor each others' grace periods/

You may want to explore using "RecoveryBackend = rados_cluster" instead,
which should handle that situation better. See this writeup, for some
guidelines:


https://jtlayton.wordpress.com/2018/12/10/deploying-an-active-active-nfs-cluster-over-cephfs/

Much of this is already automated too if you use k8s+rook.

> RADOS_KV {
> ceph_conf = '/etc/ceph/ceph.conf';
> userid = "admin";
> pool = "cephfs_data";
> }
> 
> EXPORT
> {
> Export_id=20133;
> Path = "/";
> Pseudo = /cephfile;
> Access_Type = RW;
> Protocols = 3,4;
> Transports = TCP;
> SecType = sys,krb5,krb5i,krb5p;
> Squash = Root_Squash;
> Attr_Expiration_Time = 0;
> 
> FSAL {
> Name = CEPH;
> User_Id = "admin";
> }
> 
> 
> }
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On 28/07/2019 12:11, Lee Norvall wrote:
> > Hi
> > 
> > I am using ceph-ansible to deploy and just looking for best way/tips on 
> > how to export multiple pools/fs.
> > 
> > Ceph: nautilus (14.2.2)
> > NFS-Ganesha v 2.8
> > ceph-ansible stable 4.0
> > 
> > I have 3 x osd/NFS gateways running and NFS on the dashboard can see 
> > them in the cluster.  I have managed to export for cephfs / and mounted 
> > it on another box.
> > 
> > 1) can I add a new pool/fs to the export under that same NFS gateway 
> > cluster, or
> > 
> > 2) do I have the to do something like add a new pool to the fs and then 
> > setfattr to make the layout /newfs_dir point to /new_pool?  does this 
> > cause issues and false object count?
> > 
> > 3) any other better ways...
> > 
> > Rgds
> > 
> > Lee
> > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> -- 
>  
> 
> Lee Norvall | CEO / Founder 
> Mob. +44 (0)7768 201884 
> Tel. +44 (0)20 3026 8930 
> Web. www.blocz.io 
> 
> Enterprise Cloud | Private Cloud | Hybrid/Multi Cloud | Cloud Backup 
> 
> 
> 
> This e-mail (and any attachment) has been sent from a PC belonging to My Mind 
> (Holdings) Limited. If you receive it in error, please tell us by return and 
> then delete it from your system; you may not rely on its contents nor 
> copy/disclose it to anyone. Opinions, conclusions and statements of intent in 
> this e-mail are those of the sender and will not bind My Mind (Holdings) 
> Limited unless confirmed by an authorised representative independently of 
> this message. We do not accept responsibility for viruses; you must scan for 
> these. Please note that e-mails sent to and from blocz IO Limited are 
> routinely monitored for record keeping, quality control and training 
> purposes, to ensure regulatory compliance and to prevent viruses and 
> unauthorised use of our computer systems. My Mind (Holdings) Limited is 
> registered in England & Wales under company number 10186410. Registered 
> office: 1st Floor Offices, 2a Highfield Road, Ringwood, Hampshire, United 
> Kingdom, BH24 1RQ. VAT Registration GB 244 
 9628 77
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Jeff Layton 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Nfs-ganesha-devel] 2.7.3 with CEPH_FSAL Crashing

2019-07-17 Thread Jeff Layton
Ahh, I just noticed you were running nautilus on the client side. This
patch went into v14.2.2, so once you update to that you should be good
to go.

-- Jeff

On Wed, 2019-07-17 at 17:10 -0400, Jeff Layton wrote:
> This is almost certainly the same bug that is fixed here:
> 
> https://github.com/ceph/ceph/pull/28324
> 
> It should get backported soon-ish but I'm not sure which luminous
> release it'll show up in.
> 
> Cheers,
> Jeff
> 
> On Wed, 2019-07-17 at 10:36 +0100, David C wrote:
> > Thanks for taking a look at this, Daniel. Below is the only interesting bit 
> > from the Ceph MDS log at the time of the crash but I suspect the slow 
> > requests are a result of the Ganesha crash rather than the cause of it. 
> > Copying the Ceph list in case anyone has any ideas.
> > 
> > 2019-07-15 15:06:54.624007 7f5fda5bb700  0 log_channel(cluster) log [WRN] : 
> > 6 slow requests, 5 included below; oldest blocked for > 34.588509 secs
> > 2019-07-15 15:06:54.624017 7f5fda5bb700  0 log_channel(cluster) log [WRN] : 
> > slow request 33.113514 seconds old, received at 2019-07-15 15:06:21.510423: 
> > client_request(client.16140784:5571174 setattr mtime=2019-07-15 
> > 14:59:45.642408 #0x10009079cfb 2019-07
> > -15 14:59:45.642408 caller_uid=1161, caller_gid=1131{}) currently failed to 
> > xlock, waiting
> > 2019-07-15 15:06:54.624020 7f5fda5bb700  0 log_channel(cluster) log [WRN] : 
> > slow request 34.588509 seconds old, received at 2019-07-15 15:06:20.035428: 
> > client_request(client.16129440:1067288 create 
> > #0x1000907442e/filePathEditorRegistryPrefs.melDXAtss 201
> > 9-07-15 14:59:53.694087 caller_uid=1161, 
> > caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,35
> > 22,3520,3523,}) currently failed to wrlock, waiting
> > 2019-07-15 15:06:54.624025 7f5fda5bb700  0 log_channel(cluster) log [WRN] : 
> > slow request 34.583918 seconds old, received at 2019-07-15 15:06:20.040019: 
> > client_request(client.16140784:5570551 getattr pAsLsXsFs #0x1000907443b 
> > 2019-07-15 14:59:44.171408 cal
> > ler_uid=1161, caller_gid=1131{}) currently failed to rdlock, waiting
> > 2019-07-15 15:06:54.624028 7f5fda5bb700  0 log_channel(cluster) log [WRN] : 
> > slow request 34.580632 seconds old, received at 2019-07-15 15:06:20.043305: 
> > client_request(client.16129440:1067293 unlink 
> > #0x1000907442e/filePathEditorRegistryPrefs.melcdzxxc 201
> > 9-07-15 14:59:53.701964 caller_uid=1161, 
> > caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,35
> > 22,3520,3523,}) currently failed to wrlock, waiting
> > 2019-07-15 15:06:54.624032 7f5fda5bb700  0 log_channel(cluster) log [WRN] : 
> > slow request 34.538332 seconds old, received at 2019-07-15 15:06:20.085605: 
> > client_request(client.16129440:1067308 create 
> > #0x1000907442e/filePathEditorRegistryPrefs.melHHljMk 201
> > 9-07-15 14:59:53.744266 caller_uid=1161, 
> > caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,3522,3520,3523,})
> >  currently failed to wrlock, waiting
> > 2019-07-15 15:06:55.014073 7f5fdcdc0700  1 mds.mds01 Updating MDS map to 
> > version 68166 from mon.2
> > 2019-07-15 15:06:59.624041 7f5fda5bb700  0 log_channel(cluster) log [WRN] : 
> > 7 slow requests, 2 included below; oldest blocked for > 39.588571 secs
> > 2019-07-15 15:06:59.624048 7f5fda5bb700  0 log_channel(cluster) log [WRN] : 
> > slow request 30.495843 seconds old, received at 2019-07-15 15:06:29.128156: 
> > client_request(client.16129440:1072227 create 
> > #0x1000907442e/filePathEditorRegistryPrefs.mel58AQSv 2019-07-15 
> > 15:00:02.786754 caller_uid=1161, 
> > caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,3522,3520,3523,})
> >  currently failed to wrlock, waiting
> > 2019-07-15 15:06:59.624053 7f5fda5bb700  0 log_channel(cluster) log [WRN] : 
> > slow request 39.432848 seconds old, received at 2019-07-15 15:06:20.191151: 
> > client_request(client.16140784:5570649 mknod 
> > #0x1000907442e/filePathEditorRegistryPrefs.mel3HZLNE 2019-07-15 
> > 14:59:44.322408 caller_

Re: [ceph-users] [Nfs-ganesha-devel] 2.7.3 with CEPH_FSAL Crashing

2019-07-17 Thread Jeff Layton
This is almost certainly the same bug that is fixed here:

https://github.com/ceph/ceph/pull/28324

It should get backported soon-ish but I'm not sure which luminous
release it'll show up in.

Cheers,
Jeff

On Wed, 2019-07-17 at 10:36 +0100, David C wrote:
> Thanks for taking a look at this, Daniel. Below is the only interesting bit 
> from the Ceph MDS log at the time of the crash but I suspect the slow 
> requests are a result of the Ganesha crash rather than the cause of it. 
> Copying the Ceph list in case anyone has any ideas.
> 
> 2019-07-15 15:06:54.624007 7f5fda5bb700  0 log_channel(cluster) log [WRN] : 6 
> slow requests, 5 included below; oldest blocked for > 34.588509 secs
> 2019-07-15 15:06:54.624017 7f5fda5bb700  0 log_channel(cluster) log [WRN] : 
> slow request 33.113514 seconds old, received at 2019-07-15 15:06:21.510423: 
> client_request(client.16140784:5571174 setattr mtime=2019-07-15 
> 14:59:45.642408 #0x10009079cfb 2019-07
> -15 14:59:45.642408 caller_uid=1161, caller_gid=1131{}) currently failed to 
> xlock, waiting
> 2019-07-15 15:06:54.624020 7f5fda5bb700  0 log_channel(cluster) log [WRN] : 
> slow request 34.588509 seconds old, received at 2019-07-15 15:06:20.035428: 
> client_request(client.16129440:1067288 create 
> #0x1000907442e/filePathEditorRegistryPrefs.melDXAtss 201
> 9-07-15 14:59:53.694087 caller_uid=1161, 
> caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,35
> 22,3520,3523,}) currently failed to wrlock, waiting
> 2019-07-15 15:06:54.624025 7f5fda5bb700  0 log_channel(cluster) log [WRN] : 
> slow request 34.583918 seconds old, received at 2019-07-15 15:06:20.040019: 
> client_request(client.16140784:5570551 getattr pAsLsXsFs #0x1000907443b 
> 2019-07-15 14:59:44.171408 cal
> ler_uid=1161, caller_gid=1131{}) currently failed to rdlock, waiting
> 2019-07-15 15:06:54.624028 7f5fda5bb700  0 log_channel(cluster) log [WRN] : 
> slow request 34.580632 seconds old, received at 2019-07-15 15:06:20.043305: 
> client_request(client.16129440:1067293 unlink 
> #0x1000907442e/filePathEditorRegistryPrefs.melcdzxxc 201
> 9-07-15 14:59:53.701964 caller_uid=1161, 
> caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,35
> 22,3520,3523,}) currently failed to wrlock, waiting
> 2019-07-15 15:06:54.624032 7f5fda5bb700  0 log_channel(cluster) log [WRN] : 
> slow request 34.538332 seconds old, received at 2019-07-15 15:06:20.085605: 
> client_request(client.16129440:1067308 create 
> #0x1000907442e/filePathEditorRegistryPrefs.melHHljMk 201
> 9-07-15 14:59:53.744266 caller_uid=1161, 
> caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,3522,3520,3523,})
>  currently failed to wrlock, waiting
> 2019-07-15 15:06:55.014073 7f5fdcdc0700  1 mds.mds01 Updating MDS map to 
> version 68166 from mon.2
> 2019-07-15 15:06:59.624041 7f5fda5bb700  0 log_channel(cluster) log [WRN] : 7 
> slow requests, 2 included below; oldest blocked for > 39.588571 secs
> 2019-07-15 15:06:59.624048 7f5fda5bb700  0 log_channel(cluster) log [WRN] : 
> slow request 30.495843 seconds old, received at 2019-07-15 15:06:29.128156: 
> client_request(client.16129440:1072227 create 
> #0x1000907442e/filePathEditorRegistryPrefs.mel58AQSv 2019-07-15 
> 15:00:02.786754 caller_uid=1161, 
> caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,3522,3520,3523,})
>  currently failed to wrlock, waiting
> 2019-07-15 15:06:59.624053 7f5fda5bb700  0 log_channel(cluster) log [WRN] : 
> slow request 39.432848 seconds old, received at 2019-07-15 15:06:20.191151: 
> client_request(client.16140784:5570649 mknod 
> #0x1000907442e/filePathEditorRegistryPrefs.mel3HZLNE 2019-07-15 
> 14:59:44.322408 caller_uid=1161, caller_gid=1131{}) currently failed to 
> wrlock, waiting
> 2019-07-15 15:07:03.014108 7f5fdcdc0700  1 mds.mds01 Updating MDS map to 
> version 68167 from mon.2
> 2019-07-15 15:07:04.624096 7f5fda5bb700  0 log_channel(cluster) log [WRN] : 8 
> slow requests, 1 included below; oldest blocked for > 44.588632 secs
> 2019-07-15 15:07:04.624103 7f5fda5bb700  0 log_channel(cluster) log [WRN] : 
> slow reques

Re: [ceph-users] CephFS : Kernel/Fuse technical differences

2019-06-24 Thread Jeff Layton
On Mon, 2019-06-24 at 15:51 +0200, Hervé Ballans wrote:
> Hi everyone,
> 
> We successfully use Ceph here for several years now, and since recently, 
> CephFS.
> 
>  From the same CephFS server, I notice a big difference between a fuse 
> mount and a kernel mount (10 times faster for kernel mount). It makes 
> sense to me (an additional fuse library versus a direct access to a 
> device...), but recently, one of our users asked me to explain him in 
> more detail the reason for this big difference...Hum...
> 
> I then realized that I didn't really know how to explain the reasons to 
> him !!
> 
> As well, does anyone have a more detailed explanation in a few words or 
> know a good web resource on this subject (I guess it's not specific to 
> Ceph but it's generic to all filesystems ?..)
> 
> Thanks in advance,
> Hervé
> 

A lot of it is the context switching.

Every time you make a system call (or other activity) that accesses a
FUSE mount, it has to dispatch that request to the fuse device, the
userland ceph-fuse daemon then has to wake up and do its thing (at least
once) and then send the result back down to the kernel which then wakes
up the original task so it can get the result.

FUSE is a wonderful thing, but it's not really built for speed.

-- 
Jeff Layton 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Nfs-ganesha with rados_kv backend

2019-05-29 Thread Jeff Layton
On Wed, 2019-05-29 at 13:49 +, Stolte, Felix wrote:
> Hi,
> 
> is anyone running an active-passive nfs-ganesha cluster with cephfs backend 
> and using the rados_kv recovery backend? My setup runs fine, but takeover is 
> giving me a headache. On takeover I see the following messages in ganeshas 
> log file:
> 

Note that there are significant problems with the rados_kv recovery
backend. In particular, it does not properly handle the case where the
server crashes during the grace period. The rados_ng and rados_cluster
backends do handle those situations properly.

> 29/05/2019 15:38:21 : epoch 5cee88c4 : cephgw-e2-1 : 
> ganesha.nfsd-9793[dbus_heartbeat] nfs_start_grace :STATE :EVENT :NFS Server 
> Now IN GRACE, duration 5
> 29/05/2019 15:38:21 : epoch 5cee88c4 : cephgw-e2-1 : 
> ganesha.nfsd-9793[dbus_heartbeat] nfs_start_grace :STATE :EVENT :NFS Server 
> recovery event 5 nodeid -1 ip 10.0.0.5
> 29/05/2019 15:38:21 : epoch 5cee88c4 : cephgw-e2-1 : 
> ganesha.nfsd-9793[dbus_heartbeat] rados_kv_traverse :CLIENT ID :EVENT :Failed 
> to lst kv ret=-2
> 29/05/2019 15:38:21 : epoch 5cee88c4 : cephgw-e2-1 : 
> ganesha.nfsd-9793[dbus_heartbeat] rados_kv_read_recov_clids_takeover :CLIENT 
> ID :EVENT :Failed to takeover
> 29/05/2019 15:38:26 : epoch 5cee88c4 : cephgw-e2-1 : 
> ganesha.nfsd-9793[reaper] nfs_lift_grace_locked :STATE :EVENT :NFS Server Now 
> NOT IN GRACE
> 
> The result is clients hanging for up to 2 Minutes. Has anyone ran into the 
> same problem?
> 
> Ceph Version: 12.2.11
> nfs-ganesha: 2.7.3
> 

If I had to guess, the hanging is probably due to state that is being
held by the other node's MDS session that hasn't expired yet. Ceph v12
doesn't have the client reclaim interfaces that make more instantaneous
failover possible. That's new in v14 (Nautilus). See pages 12 and 13
here:

https://static.sched.com/hosted_files/cephalocon2019/86/Rook-Deployed%20NFS%20Clusters%20over%20CephFS.pdf

> ganesha.conf (identical on both nodes besides nodeid in rados_kv:
> 
> NFS_CORE_PARAM {
> Enable_RQUOTA = false;
> Protocols = 3,4;
> }
> 
> CACHEINODE {
> Dir_Chunk = 0;
> NParts = 1;
> Cache_Size = 1;
> }
> 
> NFS_krb5 {
> Active_krb5 = false;
> }
> 
> NFSv4 {
> Only_Numeric_Owners = true;
> RecoveryBackend = rados_kv;
> Grace_Period = 5;
> Lease_Lifetime = 5;

Yikes! That's _way_ too short a grace period and lease lifetime. Ganesha
will probably exit the grace period before the clients ever realize the
server has restarted, and they will fail to reclaim their state.

> Minor_Versions = 1,2;
> }
> 
> RADOS_KV {
> ceph_conf = '/etc/ceph/ceph.conf';
> userid = "ganesha";
> pool = "cephfs_metadata";
> namespace = "ganesha";
> nodeid = "cephgw-k2-1";
> }
> 
> Any hint would be appreciated.

I consider ganesha's dbus-based takeover mechanism to be broken by
design, as it requires the recovery backend to do things that can't be
done atomically. If a crash occurs at the wrong time, the recovery
database can end up trashed and no one can reclaim anything.

If you really want an active/passive setup then I'd move away from that
and just have whatever clustering software you're using start up the
daemon on the active node after ensuring that it's shut down on the
passive one. With that, you can also use the rados_ng recovery backend,
which is more resilient in the face of multiple crashes.

In that configuration you would want to have the same config file on
both nodes, including the same nodeid so that you can potentially take
advantage of the RECLAIM_RESET interface to kill off the old session
quickly after the server restarts.

You also need a much longer grace period.

Cheers,
-- 
Jeff Layton 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] NFS-Ganesha CEPH_FSAL | potential locking issue

2019-04-16 Thread Jeff Layton
On Tue, Apr 16, 2019 at 10:36 AM David C  wrote:
>
> Hi All
>
> I have a single export of my cephfs using the ceph_fsal [1]. A CentOS 7 
> machine mounts a sub-directory of the export [2] and is using it for the home 
> directory of a user (e.g everything under ~ is on the server).
>
> This works fine until I start a long sequential write into the home directory 
> such as:
>
> dd if=/dev/zero of=~/deleteme bs=1M count=8096
>
> This saturates the 1GbE link on the client which is great but during the 
> transfer, apps that are accessing files in home start to lock up. Google 
> Chrome for example, which puts it's config in ~/.config/google-chrome/,  
> locks up during the transfer, e.g I can't move between tabs, as soon as the 
> transfer finishes, Chrome goes back to normal. Essentially the desktop 
> environment reacts as I'd expect if the server was to go away. I'm using the 
> MATE DE.
>
> However, if I mount a separate directory from the same export on the machine 
> [3] and do the same write into that directory, my desktop experience isn't 
> affected.
>
> I hope that makes some sense, it's a bit of a weird one to describe. This 
> feels like a locking issue to me, although I can't explain why a single write 
> into the root of a mount would affect access to other files under that same 
> mount.
>

It's not a single write. You're doing 8G worth of 1M I/Os. The server
then has to do all of those to the OSD backing store.

> [1] CephFS export:
>
> EXPORT
> {
> Export_ID=100;
> Protocols = 4;
> Transports = TCP;
> Path = /;
> Pseudo = /ceph/;
> Access_Type = RW;
> Attr_Expiration_Time = 0;
> Disable_ACL = FALSE;
> Manage_Gids = TRUE;
> Filesystem_Id = 100.1;
> FSAL {
> Name = CEPH;
> }
> }
>
> [2] Home directory mount:
>
> 10.10.10.226:/ceph/homes/username on /homes/username type nfs4 
> (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.10.135,local_lock=none,addr=10.10.10.226)
>
> [3] Test directory mount:
>
> 10.10.10.226:/ceph/testing on /tmp/testing type nfs4 
> (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.10.135,local_lock=none,addr=10.10.10.226)
>
> Versions:
>
> Luminous 12.2.10
> nfs-ganesha-2.7.1-0.1.el7.x86_64
> nfs-ganesha-ceph-2.7.1-0.1.el7.x86_64
>
> Ceph.conf on nfs-ganesha server:
>
> [client]
> mon host = 10.10.10.210:6789, 10.10.10.211:6789, 10.10.10.212:6789
> client_oc_size = 8388608000
> client_acl_type=posix_acl
> client_quota = true
> client_quota_df = true
>

No magic bullets here, I'm afraid.

Sounds like ganesha is probably just too swamped with write requests
to do much else, but you'll probably want to do the legwork starting
with the hanging application, and figure out what it's doing that
takes so long. Is it some syscall? Which one?

>From there you can start looking at statistics in the NFS client to
see what's going on there. Are certain RPCs taking longer than they
should? Which ones?

Once you know what's going on with the client, you can better tell
what's going on with the server.
-- 
Jeff Layton 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Deploying a Ceph+NFS Server Cluster with Rook

2019-03-06 Thread Jeff Layton
I had several people ask me to put together some instructions on how to
deploy a Ceph+NFS cluster from scratch, and the new functionality in
Ceph and rook.io make this quite easy.

I wrote a Ceph community blog post that walks the reader through the
process:

https://ceph.com/community/deploying-a-cephnfs-server-cluster-with-rook/

I don't think that site has a way to post comments, but I'm happy to
answer questions about it via email.
-- 
Jeff Layton 


signature.asc
Description: This is a digitally signed message part
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel P4600 3.2TB U.2 form factor NVMe firmware problems causing dead disks

2019-02-26 Thread Jeff Smith
We had several postgresql servers running these disks from Dell.  Numerous
failures, including one server that had 3 die at once.  Dell claims it is a
firmware issue instructed us to upgrade to  QDV1DP15 from  QDV1DP12 (I am
not sure how these line up to the Intel firmwares).  We lost several more
during the upgrade process.  We are using ZFS with these drives.  I can
confirm it is not a Ceph Bluestore only issue.

On Mon, Feb 18, 2019 at 8:44 AM David Turner  wrote:

> We have 2 clusters of [1] these disks that have 2 Bluestore OSDs per disk
> (partitioned), 3 disks per node, 5 nodes per cluster.  The clusters are
> 12.2.4 running CephFS and RBDs.  So in total we have 15 NVMe's per cluster
> and 30 NVMe's in total.  They were all built at the same time and were
> running firmware version QDV10130.  On this firmware version we early on
> had 2 disks failures, a few months later we had 1 more, and then a month
> after that (just a few weeks ago) we had 7 disk failures in 1 week.
>
> The failures are such that the disk is no longer visible to the OS.  This
> holds true beyond server reboots as well as placing the failed disks into a
> new server.  With a firmware upgrade tool we got an error that pretty much
> said there's no way to get data back and to RMA the disk.  We upgraded all
> of our remaining disks' firmware to QDV101D1 and haven't had any problems
> since then.  Most of our failures happened while rebalancing the cluster
> after replacing dead disks and we tested rigorously around that use case
> after upgrading the firmware.  This firmware version seems to have resolved
> whatever the problem was.
>
> We have about 100 more of these scattered among database servers and other
> servers that have never had this problem while running the
> QDV10130 firmware as well as firmwares between this one and the one we
> upgraded to.  Bluestore on Ceph is the only use case we've had so far with
> this sort of failure.
>
> Has anyone else come across this issue before?  Our current theory is that
> Bluestore is accessing the disk in a way that is triggering a bug in the
> older firmware version that isn't triggered by more traditional
> filesystems.  We have a scheduled call with Intel to discuss this, but
> their preliminary searches into the bugfixes and known problems between
> firmware versions didn't indicate the bug that we triggered.  It would be
> good to have some more information about what those differences for disk
> accessing might be to hopefully get a better answer from them as to what
> the problem is.
>
>
> [1]
> https://www.intel.com/content/www/us/en/products/memory-storage/solid-state-drives/data-center-ssds/dc-p4600-series/dc-p4600-3-2tb-2-5inch-3d1.html
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-18 Thread Jeff Layton
On Mon, 2019-02-18 at 17:02 +0100, Paul Emmerich wrote:
> > > I've benchmarked a ~15% performance difference in IOPS between cache
> > > expiration time of 0 and 10 when running fio on a single file from a
> > > single client.
> > > 
> > > 
> > 
> > NFS iops? I'd guess more READ ops in particular? Is that with a
> > FSAL_CEPH backend?
> 
> Yes. But that take that with a grain of salt, that was just a quick
> and dirty test of a very specific scenario that may or may not be
> relevant.
> 
> 

Sure.

If the NFS iops go up when you remove a layer of caching, then that
suggests that you had a situation where the cache likely should have
been invalidated, but wasn't. Basically, you may be sacrificing cache
coherency for performance.

The bigger question I have is whether the ganesha mdcache provides any
performance gain when the attributes are already cached in the libcephfs
layer.

If we did want to start using the mdcache, then we'd almost certainly
want to invalidate that cache when libcephfs gives up caps. I just don't
see how the extra layer of caching provides much value in that
situation.


> > 
> > > > > On Thu, Feb 14, 2019 at 9:04 PM Jeff Layton  
> > > > > wrote:
> > > > > > On Thu, 2019-02-14 at 20:57 +0800, Marvin Zhang wrote:
> > > > > > > Here is the copy from https://tools.ietf.org/html/rfc7530#page-40
> > > > > > > Will Client query 'change' attribute every time before reading to 
> > > > > > > know
> > > > > > > if the data has been changed?
> > > > > > > 
> > > > > > >   
> > > > > > > +-+++-+---+
> > > > > > >   | Name| ID | Data Type  | Acc | Defined in  
> > > > > > >   |
> > > > > > >   
> > > > > > > +-+++-+---+
> > > > > > >   | supported_attrs | 0  | bitmap4| R   | Section 5.8.1.1 
> > > > > > >   |
> > > > > > >   | type| 1  | nfs_ftype4 | R   | Section 5.8.1.2 
> > > > > > >   |
> > > > > > >   | fh_expire_type  | 2  | uint32_t   | R   | Section 5.8.1.3 
> > > > > > >   |
> > > > > > >   | change  | 3  | changeid4  | R   | Section 5.8.1.4 
> > > > > > >   |
> > > > > > >   | size| 4  | uint64_t   | R W | Section 5.8.1.5 
> > > > > > >   |
> > > > > > >   | link_support| 5  | bool   | R   | Section 5.8.1.6 
> > > > > > >   |
> > > > > > >   | symlink_support | 6  | bool   | R   | Section 5.8.1.7 
> > > > > > >   |
> > > > > > >   | named_attr  | 7  | bool   | R   | Section 5.8.1.8 
> > > > > > >   |
> > > > > > >   | fsid| 8  | fsid4  | R   | Section 5.8.1.9 
> > > > > > >   |
> > > > > > >   | unique_handles  | 9  | bool   | R   | Section 
> > > > > > > 5.8.1.10  |
> > > > > > >   | lease_time  | 10 | nfs_lease4 | R   | Section 
> > > > > > > 5.8.1.11  |
> > > > > > >   | rdattr_error| 11 | nfsstat4   | R   | Section 
> > > > > > > 5.8.1.12  |
> > > > > > >   | filehandle  | 19 | nfs_fh4| R   | Section 
> > > > > > > 5.8.1.13  |
> > > > > > >   
> > > > > > > +-+++-+---+
> > > > > > > 
> > > > > > 
> > > > > > Not every time -- only when the cache needs revalidation.
> > > > > > 
> > > > > > In the absence of a delegation, that happens on a timeout (see the
> > > > > > acregmin/acregmax settings in nfs(5)), though things like opens and 
> > > > > > file
> > > > > > locking events also affect when the client revalidates.
> > > > > > 
> > > > > > When the v4 client does revalidate the cache, it relies heavily on 
> > > > > > NFSv4
> > > > > > change attribute. Cephfs's change attribute is cluster-coherent 
> > > > > > too, so
> > > > > &

Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-18 Thread Jeff Layton
On Mon, 2019-02-18 at 16:40 +0100, Paul Emmerich wrote:
> > A call into libcephfs from ganesha to retrieve cached attributes is
> > mostly just in-memory copies within the same process, so any performance
> > overhead there is pretty minimal. If we need to go to the network to get
> > the attributes, then that was a case where the cache should have been
> > invalidated anyway, and we avoid having to check the validity of the
> > cache.
> 
> I've benchmarked a ~15% performance difference in IOPS between cache
> expiration time of 0 and 10 when running fio on a single file from a
> single client.
> 
> 

NFS iops? I'd guess more READ ops in particular? Is that with a
FSAL_CEPH backend?


> 
> > 
> > > On Thu, Feb 14, 2019 at 9:04 PM Jeff Layton  
> > > wrote:
> > > > On Thu, 2019-02-14 at 20:57 +0800, Marvin Zhang wrote:
> > > > > Here is the copy from https://tools.ietf.org/html/rfc7530#page-40
> > > > > Will Client query 'change' attribute every time before reading to know
> > > > > if the data has been changed?
> > > > > 
> > > > >   +-+++-+---+
> > > > >   | Name| ID | Data Type  | Acc | Defined in|
> > > > >   +-+++-+---+
> > > > >   | supported_attrs | 0  | bitmap4| R   | Section 5.8.1.1   |
> > > > >   | type| 1  | nfs_ftype4 | R   | Section 5.8.1.2   |
> > > > >   | fh_expire_type  | 2  | uint32_t   | R   | Section 5.8.1.3   |
> > > > >   | change  | 3  | changeid4  | R   | Section 5.8.1.4   |
> > > > >   | size| 4  | uint64_t   | R W | Section 5.8.1.5   |
> > > > >   | link_support| 5  | bool   | R   | Section 5.8.1.6   |
> > > > >   | symlink_support | 6  | bool   | R   | Section 5.8.1.7   |
> > > > >   | named_attr  | 7  | bool   | R   | Section 5.8.1.8   |
> > > > >   | fsid| 8  | fsid4  | R   | Section 5.8.1.9   |
> > > > >   | unique_handles  | 9  | bool   | R   | Section 5.8.1.10  |
> > > > >   | lease_time  | 10 | nfs_lease4 | R   | Section 5.8.1.11  |
> > > > >   | rdattr_error| 11 | nfsstat4   | R   | Section 5.8.1.12  |
> > > > >   | filehandle  | 19 | nfs_fh4| R   | Section 5.8.1.13  |
> > > > >   +-+++-+---+
> > > > > 
> > > > 
> > > > Not every time -- only when the cache needs revalidation.
> > > > 
> > > > In the absence of a delegation, that happens on a timeout (see the
> > > > acregmin/acregmax settings in nfs(5)), though things like opens and file
> > > > locking events also affect when the client revalidates.
> > > > 
> > > > When the v4 client does revalidate the cache, it relies heavily on NFSv4
> > > > change attribute. Cephfs's change attribute is cluster-coherent too, so
> > > > if the client does revalidate it should see changes made on other
> > > > servers.
> > > > 
> > > > > On Thu, Feb 14, 2019 at 8:29 PM Jeff Layton  
> > > > > wrote:
> > > > > > On Thu, 2019-02-14 at 19:49 +0800, Marvin Zhang wrote:
> > > > > > > Hi Jeff,
> > > > > > > Another question is about Client Caching when disabling 
> > > > > > > delegation.
> > > > > > > I set breakpoint on nfs4_op_read, which is OP_READ process 
> > > > > > > function in
> > > > > > > nfs-ganesha. Then I read a file, I found that it will hit only 
> > > > > > > once on
> > > > > > > the first time, which means latter reading operation on this file 
> > > > > > > will
> > > > > > > not trigger OP_READ. It will read the data from client side 
> > > > > > > cache. Is
> > > > > > > it right?
> > > > > > 
> > > > > > Yes. In the absence of a delegation, the client will periodically 
> > > > > > query
> > > > > > for the inode attributes, and will serve reads from the cache if it
> > > > > > looks like the file hasn't changed.
> > > > > > 
> > > > > > > I also checked the nfs client code in

Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-15 Thread Jeff Layton
On Fri, 2019-02-15 at 15:34 +0800, Marvin Zhang wrote:
> Thanks Jeff.
> If I set Attr_Expiration_Time as zero in conf , deos it mean timeout
> is zero? If so, every client will see the change immediately. Will it
> decrease the performance hardly?
> I seems that GlusterFS FSAL use  UPCALL to invalidate the cache. How
> about the CephFS FSAL?
> 

We mostly suggest ganesha's attribute cache be disabled when exporting
FSAL_CEPH. libcephfs caches attributes too, and it knows the status of
those attributes better than ganesha can.

A call into libcephfs from ganesha to retrieve cached attributes is
mostly just in-memory copies within the same process, so any performance
overhead there is pretty minimal. If we need to go to the network to get
the attributes, then that was a case where the cache should have been
invalidated anyway, and we avoid having to check the validity of the
cache.


> On Thu, Feb 14, 2019 at 9:04 PM Jeff Layton  wrote:
> > On Thu, 2019-02-14 at 20:57 +0800, Marvin Zhang wrote:
> > > Here is the copy from https://tools.ietf.org/html/rfc7530#page-40
> > > Will Client query 'change' attribute every time before reading to know
> > > if the data has been changed?
> > > 
> > >   +-+++-+---+
> > >   | Name| ID | Data Type  | Acc | Defined in|
> > >   +-+++-+---+
> > >   | supported_attrs | 0  | bitmap4| R   | Section 5.8.1.1   |
> > >   | type| 1  | nfs_ftype4 | R   | Section 5.8.1.2   |
> > >   | fh_expire_type  | 2  | uint32_t   | R   | Section 5.8.1.3   |
> > >   | change  | 3  | changeid4  | R   | Section 5.8.1.4   |
> > >   | size| 4  | uint64_t   | R W | Section 5.8.1.5   |
> > >   | link_support| 5  | bool   | R   | Section 5.8.1.6   |
> > >   | symlink_support | 6  | bool   | R   | Section 5.8.1.7   |
> > >   | named_attr  | 7  | bool   | R   | Section 5.8.1.8   |
> > >   | fsid| 8  | fsid4  | R   | Section 5.8.1.9   |
> > >   | unique_handles  | 9  | bool   | R   | Section 5.8.1.10  |
> > >   | lease_time  | 10 | nfs_lease4 | R   | Section 5.8.1.11  |
> > >   | rdattr_error| 11 | nfsstat4   | R   | Section 5.8.1.12  |
> > >   | filehandle  | 19 | nfs_fh4| R   | Section 5.8.1.13  |
> > >   +-+++-+---+
> > > 
> > 
> > Not every time -- only when the cache needs revalidation.
> > 
> > In the absence of a delegation, that happens on a timeout (see the
> > acregmin/acregmax settings in nfs(5)), though things like opens and file
> > locking events also affect when the client revalidates.
> > 
> > When the v4 client does revalidate the cache, it relies heavily on NFSv4
> > change attribute. Cephfs's change attribute is cluster-coherent too, so
> > if the client does revalidate it should see changes made on other
> > servers.
> > 
> > > On Thu, Feb 14, 2019 at 8:29 PM Jeff Layton  
> > > wrote:
> > > > On Thu, 2019-02-14 at 19:49 +0800, Marvin Zhang wrote:
> > > > > Hi Jeff,
> > > > > Another question is about Client Caching when disabling delegation.
> > > > > I set breakpoint on nfs4_op_read, which is OP_READ process function in
> > > > > nfs-ganesha. Then I read a file, I found that it will hit only once on
> > > > > the first time, which means latter reading operation on this file will
> > > > > not trigger OP_READ. It will read the data from client side cache. Is
> > > > > it right?
> > > > 
> > > > Yes. In the absence of a delegation, the client will periodically query
> > > > for the inode attributes, and will serve reads from the cache if it
> > > > looks like the file hasn't changed.
> > > > 
> > > > > I also checked the nfs client code in linux kernel. Only
> > > > > cache_validity is NFS_INO_INVALID_DATA, it will send OP_READ again,
> > > > > like this:
> > > > > if (nfsi->cache_validity & NFS_INO_INVALID_DATA) {
> > > > > ret = nfs_invalidate_mapping(inode, mapping);
> > > > > }
> > > > > This about this senario, client1 connect ganesha1 and client2 connect
> > > > > ganesha2. I read /1.txt on client1 and client1 will cache the data.
> > > > > Then I modify this file on client2. At that 

Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-14 Thread Jeff Layton
On Thu, 2019-02-14 at 20:57 +0800, Marvin Zhang wrote:
> Here is the copy from https://tools.ietf.org/html/rfc7530#page-40
> Will Client query 'change' attribute every time before reading to know
> if the data has been changed?
> 
>   +-+++-+---+
>   | Name| ID | Data Type  | Acc | Defined in|
>   +-+++-+---+
>   | supported_attrs | 0  | bitmap4| R   | Section 5.8.1.1   |
>   | type| 1  | nfs_ftype4 | R   | Section 5.8.1.2   |
>   | fh_expire_type  | 2  | uint32_t   | R   | Section 5.8.1.3   |
>   | change  | 3  | changeid4  | R   | Section 5.8.1.4   |
>   | size| 4  | uint64_t   | R W | Section 5.8.1.5   |
>   | link_support| 5  | bool   | R   | Section 5.8.1.6   |
>   | symlink_support | 6  | bool   | R   | Section 5.8.1.7   |
>   | named_attr  | 7  | bool   | R   | Section 5.8.1.8   |
>   | fsid| 8  | fsid4  | R   | Section 5.8.1.9   |
>   | unique_handles  | 9  | bool   | R   | Section 5.8.1.10  |
>   | lease_time  | 10 | nfs_lease4 | R   | Section 5.8.1.11  |
>   | rdattr_error| 11 | nfsstat4   | R   | Section 5.8.1.12  |
>   | filehandle  | 19 | nfs_fh4| R   | Section 5.8.1.13  |
>   +-+++-+---+
> 

Not every time -- only when the cache needs revalidation.

In the absence of a delegation, that happens on a timeout (see the
acregmin/acregmax settings in nfs(5)), though things like opens and file
locking events also affect when the client revalidates.

When the v4 client does revalidate the cache, it relies heavily on NFSv4
change attribute. Cephfs's change attribute is cluster-coherent too, so
if the client does revalidate it should see changes made on other
servers.

> On Thu, Feb 14, 2019 at 8:29 PM Jeff Layton  wrote:
> > On Thu, 2019-02-14 at 19:49 +0800, Marvin Zhang wrote:
> > > Hi Jeff,
> > > Another question is about Client Caching when disabling delegation.
> > > I set breakpoint on nfs4_op_read, which is OP_READ process function in
> > > nfs-ganesha. Then I read a file, I found that it will hit only once on
> > > the first time, which means latter reading operation on this file will
> > > not trigger OP_READ. It will read the data from client side cache. Is
> > > it right?
> > 
> > Yes. In the absence of a delegation, the client will periodically query
> > for the inode attributes, and will serve reads from the cache if it
> > looks like the file hasn't changed.
> > 
> > > I also checked the nfs client code in linux kernel. Only
> > > cache_validity is NFS_INO_INVALID_DATA, it will send OP_READ again,
> > > like this:
> > > if (nfsi->cache_validity & NFS_INO_INVALID_DATA) {
> > > ret = nfs_invalidate_mapping(inode, mapping);
> > > }
> > > This about this senario, client1 connect ganesha1 and client2 connect
> > > ganesha2. I read /1.txt on client1 and client1 will cache the data.
> > > Then I modify this file on client2. At that time, how client1 know the
> > > file is modifed and how it will add NFS_INO_INVALID_DATA into
> > > cache_validity?
> > 
> > Once you modify the code on client2, ganesha2 will request the necessary
> > caps from the ceph MDS, and client1 will have its caps revoked. It'll
> > then make the change.
> > 
> > When client1 reads again it will issue a GETATTR against the file [1].
> > ganesha1 will then request caps to do the getattr, which will end up
> > revoking ganesha2's caps. client1 will then see the change in attributes
> > (the change attribute and mtime, most likely) and will invalidate the
> > mapping, causing it do reissue a READ on the wire.
> > 
> > [1]: There may be a window of time after you change the file on client2
> > where client1 doesn't see it. That's due to the fact that inode
> > attributes on the client are only revalidated after a timeout. You may
> > want to read over the DATA AND METADATA COHERENCE section of nfs(5) to
> > make sure you understand how the NFS client validates its caches.
> > 
> > Cheers,
> > --
> > Jeff Layton 
> > 

-- 
Jeff Layton 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-14 Thread Jeff Layton
On Thu, 2019-02-14 at 19:49 +0800, Marvin Zhang wrote:
> Hi Jeff,
> Another question is about Client Caching when disabling delegation.
> I set breakpoint on nfs4_op_read, which is OP_READ process function in
> nfs-ganesha. Then I read a file, I found that it will hit only once on
> the first time, which means latter reading operation on this file will
> not trigger OP_READ. It will read the data from client side cache. Is
> it right?

Yes. In the absence of a delegation, the client will periodically query
for the inode attributes, and will serve reads from the cache if it
looks like the file hasn't changed.

> I also checked the nfs client code in linux kernel. Only
> cache_validity is NFS_INO_INVALID_DATA, it will send OP_READ again,
> like this:
> if (nfsi->cache_validity & NFS_INO_INVALID_DATA) {
> ret = nfs_invalidate_mapping(inode, mapping);
> }
> This about this senario, client1 connect ganesha1 and client2 connect
> ganesha2. I read /1.txt on client1 and client1 will cache the data.
> Then I modify this file on client2. At that time, how client1 know the
> file is modifed and how it will add NFS_INO_INVALID_DATA into
> cache_validity?


Once you modify the code on client2, ganesha2 will request the necessary
caps from the ceph MDS, and client1 will have its caps revoked. It'll
then make the change.

When client1 reads again it will issue a GETATTR against the file [1].
ganesha1 will then request caps to do the getattr, which will end up
revoking ganesha2's caps. client1 will then see the change in attributes
(the change attribute and mtime, most likely) and will invalidate the
mapping, causing it do reissue a READ on the wire.

[1]: There may be a window of time after you change the file on client2
where client1 doesn't see it. That's due to the fact that inode
attributes on the client are only revalidated after a timeout. You may
want to read over the DATA AND METADATA COHERENCE section of nfs(5) to
make sure you understand how the NFS client validates its caches.

Cheers,
-- 
Jeff Layton 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-14 Thread Jeff Layton
On Thu, 2019-02-14 at 10:35 +0800, Marvin Zhang wrote:
> On Thu, Feb 14, 2019 at 8:09 AM Jeff Layton  wrote:
> > > Hi,
> > > As http://docs.ceph.com/docs/master/cephfs/nfs/ says, it's OK to
> > > config active/passive NFS-Ganesha to use CephFs. My question is if we
> > > can use active/active nfs-ganesha for CephFS.
> > 
> > (Apologies if you get two copies of this. I sent an earlier one from the
> > wrong account and it got stuck in moderation)
> > 
> > You can, with the new rados-cluster recovery backend that went into
> > ganesha v2.7. See here for a bit more detail:
> > 
> > https://jtlayton.wordpress.com/2018/12/10/deploying-an-active-active-nfs-cluster-over-cephfs/
> > 
> > ...also have a look at the ceph.conf file in the ganesha sources.
> > 
> > > In my thought, only state consistance should we think about.
> > > 1. Lock support for Active/Active. Even each nfs-ganesha sever mantain
> > > the lock state, the real lock/unlock will call
> > > ceph_ll_getlk/ceph_ll_setlk. So Ceph cluster will handle the lock
> > > safely.
> > > 2. Delegation support Active/Active. It's similar question 1,
> > > ceph_ll_delegation will handle it safely.
> > > 3. Nfs-ganesha cache support Active/Active. As
> > > https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/config_samples/ceph.conf
> > > describes, we can config cache size as size 1.
> > > 4. Ceph-FSAL cache support Active/Active. Like other CephFs client,
> > > there is no issues for cache consistance.
> > 
> > The basic idea with the new recovery backend is to have the different
> > NFS ganesha heads coordinate their recovery grace periods to prevent
> > stateful conflicts.
> > 
> > The one thing missing at this point is delegations in an active/active
> > configuration, but that's mainly because of the synchronous nature of
> > libcephfs. We have a potential fix for that problem but it requires work
> > in libcephfs that is not yet done.
> [marvin] So we should disable delegation on active/active and set the
> conf like this. Is it right?
> NFSv4
> {
> Delegations = false;
> }

Yes.
-- 
Jeff Layton 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-13 Thread Jeff Layton
> Hi,
> As http://docs.ceph.com/docs/master/cephfs/nfs/ says, it's OK to
> config active/passive NFS-Ganesha to use CephFs. My question is if we
> can use active/active nfs-ganesha for CephFS.

(Apologies if you get two copies of this. I sent an earlier one from the
wrong account and it got stuck in moderation)

You can, with the new rados-cluster recovery backend that went into
ganesha v2.7. See here for a bit more detail:

https://jtlayton.wordpress.com/2018/12/10/deploying-an-active-active-nfs-cluster-over-cephfs/

...also have a look at the ceph.conf file in the ganesha sources.

> In my thought, only state consistance should we think about.
> 1. Lock support for Active/Active. Even each nfs-ganesha sever mantain
> the lock state, the real lock/unlock will call
> ceph_ll_getlk/ceph_ll_setlk. So Ceph cluster will handle the lock
> safely.
> 2. Delegation support Active/Active. It's similar question 1,
> ceph_ll_delegation will handle it safely.
> 3. Nfs-ganesha cache support Active/Active. As
> https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/config_samples/ceph.conf
> describes, we can config cache size as size 1.
> 4. Ceph-FSAL cache support Active/Active. Like other CephFs client,
> there is no issues for cache consistance.

The basic idea with the new recovery backend is to have the different
NFS ganesha heads coordinate their recovery grace periods to prevent
stateful conflicts.

The one thing missing at this point is delegations in an active/active
configuration, but that's mainly because of the synchronous nature of
libcephfs. We have a potential fix for that problem but it requires work
in libcephfs that is not yet done.

Cheers,
-- 
Jeff Layton 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] list admin issues

2018-10-08 Thread Jeff Smith
I just got dumped again.  I have not sent any attechments/images.
On Mon, Oct 8, 2018 at 5:48 AM Elias Abacioglu
 wrote:
>
> If it's attachments causing this, perhaps forbid attachments? Force people to 
> use pastebin / imgur type of services?
>
> /E
>
> On Mon, Oct 8, 2018 at 1:33 PM Martin Palma  wrote:
>>
>> Same here also on Gmail with G Suite.
>> On Mon, Oct 8, 2018 at 12:31 AM Paul Emmerich  wrote:
>> >
>> > I'm also seeing this once every few months or so on Gmail with G Suite.
>> >
>> > Paul
>> > Am So., 7. Okt. 2018 um 08:18 Uhr schrieb Joshua Chen
>> > :
>> > >
>> > > I also got removed once, got another warning once (need to re-enable).
>> > >
>> > > Cheers
>> > > Joshua
>> > >
>> > >
>> > > On Sun, Oct 7, 2018 at 5:38 AM Svante Karlsson  
>> > > wrote:
>> > >>
>> > >> I'm also getting removed but not only from ceph. I subscribe 
>> > >> d...@kafka.apache.org list and the same thing happens there.
>> > >>
>> > >> Den lör 6 okt. 2018 kl 23:24 skrev Jeff Smith :
>> > >>>
>> > >>> I have been removed twice.
>> > >>> On Sat, Oct 6, 2018 at 7:07 AM Elias Abacioglu
>> > >>>  wrote:
>> > >>> >
>> > >>> > Hi,
>> > >>> >
>> > >>> > I'm bumping this old thread cause it's getting annoying. My 
>> > >>> > membership get disabled twice a month.
>> > >>> > Between my two Gmail accounts I'm in more than 25 mailing lists and 
>> > >>> > I see this behavior only here. Why is only ceph-users only affected? 
>> > >>> > Maybe Christian was on to something, is this intentional?
>> > >>> > Reality is that there is a lot of ceph-users with Gmail accounts, 
>> > >>> > perhaps it wouldn't be so bad to actually trying to figure this one 
>> > >>> > out?
>> > >>> >
>> > >>> > So can the maintainers of this list please investigate what actually 
>> > >>> > gets bounced? Look at my address if you want.
>> > >>> > I got disabled 20181006, 20180927, 20180916, 20180725, 20180718 most 
>> > >>> > recently.
>> > >>> > Please help!
>> > >>> >
>> > >>> > Thanks,
>> > >>> > Elias
>> > >>> >
>> > >>> > On Mon, Oct 16, 2017 at 5:41 AM Christian Balzer  
>> > >>> > wrote:
>> > >>> >>
>> > >>> >>
>> > >>> >> Most mails to this ML score low or negatively with SpamAssassin, 
>> > >>> >> however
>> > >>> >> once in a while (this is a recent one) we get relatively high 
>> > >>> >> scores.
>> > >>> >> Note that the forged bits are false positives, but the SA is up to 
>> > >>> >> date and
>> > >>> >> google will have similar checks:
>> > >>> >> ---
>> > >>> >> X-Spam-Status: No, score=3.9 required=10.0 tests=BAYES_00,DCC_CHECK,
>> > >>> >>  
>> > >>> >> FORGED_MUA_MOZILLA,FORGED_YAHOO_RCVD,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
>> > >>> >>  
>> > >>> >> HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MIME_HTML_MOSTLY,RCVD_IN_MSPIKE_H4,
>> > >>> >>  RCVD_IN_MSPIKE_WL,RDNS_NONE,T_DKIM_INVALID shortcircuit=no 
>> > >>> >> autolearn=no
>> > >>> >> ---
>> > >>> >>
>> > >>> >> Between attachment mails and some of these and you're well on your 
>> > >>> >> way out.
>> > >>> >>
>> > >>> >> The default mailman settings and logic require 5 bounces to trigger
>> > >>> >> unsubscription and 7 days of NO bounces to reset the counter.
>> > >>> >>
>> > >>> >> Christian
>> > >>> >>
>> > >>> >> On Mon, 16 Oct 2017 12:23:25 +0900 Christian Balzer wrote:
>> > >>> >>
>> > >>> >> > On Mon, 16 Oct 2017 14:15:22 +1100 Blair Bethwaite wrote:
>> > >>> >> >
>> &g

[ceph-users] mds will not activate

2018-10-06 Thread Jeff Smith
I had to reboot my mds.  The hot spare did not kick in and now I am
showing the filesystem is degraded and offline.  Both mds are showing
as up:standby.  I am not sure how to proceed.

  cluster:
id: 188c7fba-288f-45e9-bca1-cc5fceccd2a1
health: HEALTH_ERR
1 filesystem is degraded
1 filesystem is offline
1 mds daemon damaged
646909/1843113 objects misplaced (35.099%)

  services:
mon: 1 daemons, quorum mon.b
mgr: copious(active)
mds: bulkfs-0/1/1 up , 2 up:standby, 1 damaged
osd: 8 osds: 8 up, 8 in; 47 remapped pgs

  data:
pools:   2 pools, 94 pgs
objects: 614.4 k objects, 2.3 TiB
usage:   7.1 TiB used, 12 TiB / 19 TiB avail
pgs: 646909/1843113 objects misplaced (35.099%)
 47 active+clean
 44 active+remapped+backfill_wait
 3  active+remapped+backfilling

  io:
recovery: 37 MiB/s, 9 objects/s
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] list admin issues

2018-10-06 Thread Jeff Smith
I have been removed twice.
On Sat, Oct 6, 2018 at 7:07 AM Elias Abacioglu
 wrote:
>
> Hi,
>
> I'm bumping this old thread cause it's getting annoying. My membership get 
> disabled twice a month.
> Between my two Gmail accounts I'm in more than 25 mailing lists and I see 
> this behavior only here. Why is only ceph-users only affected? Maybe 
> Christian was on to something, is this intentional?
> Reality is that there is a lot of ceph-users with Gmail accounts, perhaps it 
> wouldn't be so bad to actually trying to figure this one out?
>
> So can the maintainers of this list please investigate what actually gets 
> bounced? Look at my address if you want.
> I got disabled 20181006, 20180927, 20180916, 20180725, 20180718 most recently.
> Please help!
>
> Thanks,
> Elias
>
> On Mon, Oct 16, 2017 at 5:41 AM Christian Balzer  wrote:
>>
>>
>> Most mails to this ML score low or negatively with SpamAssassin, however
>> once in a while (this is a recent one) we get relatively high scores.
>> Note that the forged bits are false positives, but the SA is up to date and
>> google will have similar checks:
>> ---
>> X-Spam-Status: No, score=3.9 required=10.0 tests=BAYES_00,DCC_CHECK,
>>  
>> FORGED_MUA_MOZILLA,FORGED_YAHOO_RCVD,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
>>  
>> HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MIME_HTML_MOSTLY,RCVD_IN_MSPIKE_H4,
>>  RCVD_IN_MSPIKE_WL,RDNS_NONE,T_DKIM_INVALID shortcircuit=no autolearn=no
>> ---
>>
>> Between attachment mails and some of these and you're well on your way out.
>>
>> The default mailman settings and logic require 5 bounces to trigger
>> unsubscription and 7 days of NO bounces to reset the counter.
>>
>> Christian
>>
>> On Mon, 16 Oct 2017 12:23:25 +0900 Christian Balzer wrote:
>>
>> > On Mon, 16 Oct 2017 14:15:22 +1100 Blair Bethwaite wrote:
>> >
>> > > Thanks Christian,
>> > >
>> > > You're no doubt on the right track, but I'd really like to figure out
>> > > what it is at my end - I'm unlikely to be the only person subscribed
>> > > to ceph-users via a gmail account.
>> > >
>> > > Re. attachments, I'm surprised mailman would be allowing them in the
>> > > first place, and even so gmail's attachment requirements are less
>> > > strict than most corporate email setups (those that don't already use
>> > > a cloud provider).
>> > >
>> > Mailman doesn't do anything with this by default AFAIK, but see below.
>> > Strict is fine if you're in control, corporate mail can be hell, doubly so
>> > if on M$ cloud.
>> >
>> > > This started happening earlier in the year after I turned off digest
>> > > mode. I also have a paid google domain, maybe I'll try setting
>> > > delivery to that address and seeing if anything changes...
>> > >
>> > Don't think google domain is handled differently, but what do I know.
>> >
>> > Though the digest bit confirms my suspicion about attachments:
>> > ---
>> > When a subscriber chooses to receive plain text daily “digests” of list
>> > messages, Mailman sends the digest messages without any original
>> > attachments (in Mailman lingo, it “scrubs” the messages of attachments).
>> > However, Mailman also includes links to the original attachments that the
>> > recipient can click on.
>> > ---
>> >
>> > Christian
>> >
>> > > Cheers,
>> > >
>> > > On 16 October 2017 at 13:54, Christian Balzer  wrote:
>> > > >
>> > > > Hello,
>> > > >
>> > > > You're on gmail.
>> > > >
>> > > > Aside from various potential false positives with regards to spam my 
>> > > > bet
>> > > > is that gmail's known dislike for attachments is the cause of these
>> > > > bounces and that setting is beyond your control.
>> > > >
>> > > > Because Google knows best[tm].
>> > > >
>> > > > Christian
>> > > >
>> > > > On Mon, 16 Oct 2017 13:50:43 +1100 Blair Bethwaite wrote:
>> > > >
>> > > >> Hi all,
>> > > >>
>> > > >> This is a mailing-list admin issue - I keep being unsubscribed from
>> > > >> ceph-users with the message:
>> > > >> "Your membership in the mailing list ceph-users has been disabled due
>> > > >> to excessive bounces..."
>> > > >> This seems to be happening on roughly a monthly basis.
>> > > >>
>> > > >> Thing is I have no idea what the bounce is or where it is coming from.
>> > > >> I've tried emailing ceph-users-ow...@lists.ceph.com and the contact
>> > > >> listed in Mailman (l...@redhat.com) to get more info but haven't
>> > > >> received any response despite several attempts.
>> > > >>
>> > > >> Help!
>> > > >>
>> > > >
>> > > >
>> > > > --
>> > > > Christian BalzerNetwork/Systems Engineer
>> > > > ch...@gol.com   Rakuten Communications
>> > >
>> > >
>> > >
>> >
>> >
>>
>>
>> --
>> Christian BalzerNetwork/Systems Engineer
>> ch...@gol.com   Rakuten Communications
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.

[ceph-users] interpreting ceph mds stat

2018-10-03 Thread Jeff Smith
I need some help deciphering the results of ceph mds stat.  I have
been digging in the docs for hours.  If someone can point me in the
right direction and/or help me understand.

In the documentation it shows a result like this.

cephfs-1/1/1 up {0=a=up:active}

What do each of the 1s represent?   What is the 0=a=up:active?  Is
that saying rank 0 of file system a is up:active?

Jeff Smith
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and NVMe

2018-09-06 Thread Jeff Bailey
I haven't had any problems using 375GB P4800X's in R730 and R740xd 
machines for DB+WAL.  The iDRAC whines a bit on the R740 but everything 
works fine.


On 9/6/2018 3:09 PM, Steven Vacaroaia wrote:

Hi ,
Just to add to this question, is anyone using Intel Optane DC P4800X on 
DELL R630 ...or any other server ?

Any gotchas / feedback/ knowledge sharing will be greatly appreciated
Steven

On Thu, 6 Sep 2018 at 14:59, Stefan Priebe - Profihost AG 
mailto:s.pri...@profihost.ag>> wrote:


Hello list,

has anybody tested current NVMe performance with luminous and bluestore?
Is this something which makes sense or just a waste of money?

Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Qs on caches, and cephfs

2017-10-22 Thread Jeff
Hey everyone,

Long time listener first time caller.
Thank you to everyone who works on Ceph, docs and code, I'm loving Ceph.
I've been playing with Ceph for awhile and have a few Qs.

Ceph cache tiers, can you have multiple tiered caches?

Also with cache tiers, can you have one cache pool for multiple backing
storage pools? The docs seem to be very careful about specifying one
pool so I suspect I know the answer already.

For CephFS, how do you execute a manual install and manual removal for MDS?

The docs explain how to use ceph-deploy for MDS installs, but I'm trying
to do everything manually right now to get a better understanding of it
all.

The ceph docs seem to be version controlled but I can't seem to find the
repo to update, if you can point me to it I'd be happy to submit patches
to it.

Thnx in advance!
Jeff.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] maintenance questions

2016-10-07 Thread Jeff Applewhite
Hi All

I have a few questions pertaining to management of MONs and OSDs. This is
in a Ceph 2.x context only.
---
1) Can MONs be placed in something resembling maintenance mode (for
firmware updates, patch reboots, etc.). If so how? If not how addressed?

2) Can OSDs be placed in something resembling maintenance mode (for
firmware updates, patch reboots, etc.). If so how? If not how addressed?

3) Can MONs be "replaced/migrated" efficiently in a hardware upgrade
scenario? If so how? If not how addressed?

4) Can OSDs be "replaced/migrated" efficiently in a hardware upgrade
scenario? If so how? If not how addressed?

---
​I suspect the answer is somewhat nuanced and has to do with timeouts and
such. Please describe how these things are successully handled in
production settings.​

The goal here is to automate such things in a management tool so the
strategies should be well worn. If the answer is "no you can't and it's not
addressed in Ceph" is this a potential roadmap item?

​If addressed in previous discussions please forgive me and point me to
them - new to the list.​

​Thanks in advance!​

-- 

Jeff Applewhite
Principal Product Manager
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing Replication count

2016-09-06 Thread Jeff Bailey



On 9/6/2016 8:41 PM, Vlad Blando wrote:

Hi,

My replication count now is this

[root@controller-node ~]# ceph osd lspools
4 images,5 volumes,


Those aren't replica counts they're pool ids.


[root@controller-node ~]#

and I made adjustment and made it to 3 for images and 2 to volumes to 
3, it's been 30 mins now and the values did not change, how do I know 
if it was really changed.


this is the command I executed

 ceph osd pool set images size 2
 ceph osd pool set volumes size 3

ceph osd pool set images min_size 2
ceph osd pool set images min_size 2


Another question, since the previous replication count for images is 4 
and volumes to 5, it will delete the excess replication right?


Thanks for the help


/vlad
ᐧ


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fast Ceph a Cluster with PB storage

2016-08-09 Thread Jeff Bailey



On 8/9/2016 10:43 AM, Wido den Hollander wrote:



Op 9 augustus 2016 om 16:36 schreef Александр Пивушков :


 > >> Hello dear community!

I'm new to the Ceph and not long ago took up the theme of building clusters.
Therefore it is very important to your opinion.
It is necessary to create a cluster from 1.2 PB storage and very rapid access to data. 
Earlier disks of "Intel® SSD DC P3608 Series 1.6TB NVMe PCIe 3.0 x4 Solid State 
Drive" were used, their speed of all satisfies, but with increase of volume of 
storage, the price of such cluster very strongly grows and therefore there was an idea to 
use Ceph.


You may want to tell us more about your environment, use case and in
particular what your clients are.
Large amounts of data usually means graphical or scientific data,
extremely high speed (IOPS) requirements usually mean database
like applications, which one is it, or is it a mix?


This is a mixed project, with combined graphics and science. Project linking 
the vast array of image data. Like google MAP :)
Previously, customers were Windows that are connected to powerful servers 
directly.
Ceph cluster connected on FC to servers of the virtual machines is now planned. 
Virtualization - oVirt.


Stop right there. oVirt, despite being from RedHat, doesn't really support
Ceph directly all that well, last I checked.
That is probably where you get the idea/need for FC from.

If anyhow possible, you do NOT want another layer and protocol conversion
between Ceph and the VMs, like a FC gateway or iSCSI or NFS.

So if you're free to choose your Virtualization platform, use KVM/qemu at
the bottom and something like Openstack, OpenNebula, ganeti, Pacemake with
KVM resource agents on top.

oh, that's too bad ...
I do not understand something...

oVirt built on kvm
https://www.ovirt.org/documentation/introduction/about-ovirt/

Ceph, such as support kvm
http://docs.ceph.com/docs/master/architecture/



KVM is just the hypervisor. oVirt is a tool which controls KVM and it doesn't 
have support for Ceph. That means that it can't pass down the proper arguments 
to KVM to talk to RBD.


What could be the overhead costs and how big they are?


I do not understand why oVirt bad, and the qemu in the Openstack, it's good.
What can be read?



Like I said above. oVirt and OpenStack both control KVM. OpenStack also knows 
how to  'configure' KVM to use RBD, oVirt doesn't.

Maybe Proxmox is a better solution in your case.



oVirt can use ceph through cinder.  It doesn't currently provide all the 
functionality of

other oVirt storage domains but it does work.


Wido



--
Александр Пивушков___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Tips for faster openstack instance boot

2016-02-08 Thread Jeff Bailey
Your glance images need to be raw, also.  A QCOW image will be 
copied/converted.


On 2/8/2016 3:33 PM, Jason Dillaman wrote:

If Nova and Glance are properly configured, it should only require a quick 
clone of the Glance image to create your Nova ephemeral image.  Have you 
double-checked your configuration against the documentation [1]?  What version 
of OpenStack are you using?

To answer your questions:


- From Ceph point of view. does COW works cross pool i.e. image from glance
pool ---> (cow) --> instance disk on nova pool

Yes, cloning copy-on-write images works across pools


- Will a single pool for glance and nova instead of separate pool . will help
here ?

Should be no change -- the creation of the clone is extremely lightweight (add 
the image to a directory, create a couple metadata objects)


- Is there any tunable parameter from Ceph or OpenStack side that should be
set ?

I'd double-check your OpenStack configuration.  Perhaps Glance isn't configured with 
"show_image_direct_url = True", or Glance is configured to cache your RBD 
images, or you have an older OpenStack release that requires patches to fully support 
Nova+RBD.

[1] http://docs.ceph.com/docs/master/rbd/rbd-openstack/


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSDs are down, don't know why

2016-01-18 Thread Jeff Epstein
Unfortunately, I haven't seen any obvious suspicious log messages from 
either the OSD or the MON. Is there a way to query detailed information 
on OSD monitoring, e.g. heartbeats?


On 01/18/2016 05:54 PM, Steve Taylor wrote:

With a single osd there shouldn't be much to worry about. It will have to get 
caught up on map epochs before it will report itself as up, but on a new 
cluster that should be pretty immediate.

You'll probably have to look for clues in the osd and mon logs. I would expect 
some sort of error reported in this scenario. It seems likely that it would be 
network-related in this case, but the logs will confirm or debunk that theory.

Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | Fax: 801.545.4705

If you are not the intended recipient of this message, be advised that any 
dissemination or copying of this message is prohibited.
If you received this message erroneously, please notify the sender and delete 
it, together with any attachments.


-Original Message-
From: Jeff Epstein [mailto:jeff.epst...@commerceguys.com]
Sent: Monday, January 18, 2016 8:32 AM
To: Steve Taylor ; ceph-users 

Subject: Re: [ceph-users] OSDs are down, don't know why

Hi Steve
Thanks for your answer. I don't have a private network defined.
Furthermore, in my current testing configuration, there is only one OSD, so 
communication between OSDs should be a non-issue.
Do you know how OSD up/down state is determined when there is only one OSD?
Best,
Jeff

On 01/18/2016 03:59 PM, Steve Taylor wrote:

Do you have a ceph private network defined in your config file? I've seen this 
before in that situation where the private network isn't functional. The osds 
can talk to the mon(s) but not to each other, so they report each other as down 
when they're all running just fine.


Steve Taylor | Senior Software Engineer | StorageCraft Technology
Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | Fax: 801.545.4705

If you are not the intended recipient of this message, be advised that any 
dissemination or copying of this message is prohibited.
If you received this message erroneously, please notify the sender and delete 
it, together with any attachments.


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
Of Jeff Epstein
Sent: Friday, January 15, 2016 7:28 PM
To: ceph-users 
Subject: [ceph-users] OSDs are down, don't know why

Hello,

I'm setting up a small test instance of ceph and I'm running into a situation 
where the OSDs are being shown as down, but I don't know why.

Connectivity seems to be working. The OSD hosts are able to communicate with the MON hosts; running 
"ceph status" and "ceph osd in" from an OSD host works fine, but with a 
HEALTH_WARN that I have 2 osds: 0 up, 2 in.
Both the OSD and MON daemons seem to be running fine. Network connectivity 
seems to be okay: I can nc from the OSD to port 6789 on the MON, and from the 
MON to port 6800-6803 on the OSD (I have constrained the ms bind port min/max 
config options so that the OSDs will use only these ports). Neither OSD nor MON 
logs show anything that seems unusual, nor why the OSD is marked as being down.

Furthermore, using tcpdump i've watched network traffic between the OSD and the 
MON, and it seems that the OSD is sending heartbeats and getting an ack from 
the MON. So I'm definitely not sure why the MON thinks the OSD is down.

Some questions:
- How does the MON determine if the OSD is down?
- Is there a way to get the MON to report on why an OSD is down, e.g. no 
heartbeat?
- Is there any need to open ports other than TCP 6789 and 6800-6803?
- Any other suggestions?

ceph 0.94 on Debian Jessie

Best,
Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSDs are down, don't know why

2016-01-18 Thread Jeff Epstein

Hi Steve
Thanks for your answer. I don't have a private network defined. 
Furthermore, in my current testing configuration, there is only one OSD, 
so communication between OSDs should be a non-issue.

Do you know how OSD up/down state is determined when there is only one OSD?
Best,
Jeff

On 01/18/2016 03:59 PM, Steve Taylor wrote:

Do you have a ceph private network defined in your config file? I've seen this 
before in that situation where the private network isn't functional. The osds 
can talk to the mon(s) but not to each other, so they report each other as down 
when they're all running just fine.


Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | Fax: 801.545.4705

If you are not the intended recipient of this message, be advised that any 
dissemination or copying of this message is prohibited.
If you received this message erroneously, please notify the sender and delete 
it, together with any attachments.


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jeff 
Epstein
Sent: Friday, January 15, 2016 7:28 PM
To: ceph-users 
Subject: [ceph-users] OSDs are down, don't know why

Hello,

I'm setting up a small test instance of ceph and I'm running into a situation 
where the OSDs are being shown as down, but I don't know why.

Connectivity seems to be working. The OSD hosts are able to communicate with the MON hosts; running 
"ceph status" and "ceph osd in" from an OSD host works fine, but with a 
HEALTH_WARN that I have 2 osds: 0 up, 2 in.
Both the OSD and MON daemons seem to be running fine. Network connectivity 
seems to be okay: I can nc from the OSD to port 6789 on the MON, and from the 
MON to port 6800-6803 on the OSD (I have constrained the ms bind port min/max 
config options so that the OSDs will use only these ports). Neither OSD nor MON 
logs show anything that seems unusual, nor why the OSD is marked as being down.

Furthermore, using tcpdump i've watched network traffic between the OSD and the 
MON, and it seems that the OSD is sending heartbeats and getting an ack from 
the MON. So I'm definitely not sure why the MON thinks the OSD is down.

Some questions:
- How does the MON determine if the OSD is down?
- Is there a way to get the MON to report on why an OSD is down, e.g. no 
heartbeat?
- Is there any need to open ports other than TCP 6789 and 6800-6803?
- Any other suggestions?

ceph 0.94 on Debian Jessie

Best,
Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSDs are down, don't know why

2016-01-15 Thread Jeff Epstein

Hello,

I'm setting up a small test instance of ceph and I'm running into a 
situation where the OSDs are being shown as down, but I don't know why.


Connectivity seems to be working. The OSD hosts are able to communicate 
with the MON hosts; running "ceph status" and "ceph osd in" from an OSD 
host works fine, but with a HEALTH_WARN that I have 2 osds: 0 up, 2 in. 
Both the OSD and MON daemons seem to be running fine. Network 
connectivity seems to be okay: I can nc from the OSD to port 6789 on the 
MON, and from the MON to port 6800-6803 on the OSD (I have constrained 
the ms bind port min/max config options so that the OSDs will use only 
these ports). Neither OSD nor MON logs show anything that seems unusual, 
nor why the OSD is marked as being down.


Furthermore, using tcpdump i've watched network traffic between the OSD 
and the MON, and it seems that the OSD is sending heartbeats and getting 
an ack from the MON. So I'm definitely not sure why the MON thinks the 
OSD is down.


Some questions:
- How does the MON determine if the OSD is down?
- Is there a way to get the MON to report on why an OSD is down, e.g. no 
heartbeat?

- Is there any need to open ports other than TCP 6789 and 6800-6803?
- Any other suggestions?

ceph 0.94 on Debian Jessie

Best,
Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel P3700 PCI-e as journal drives?

2016-01-12 Thread Jeff Bailey

On 1/12/2016 4:51 AM, Burkhard Linke wrote:

Hi,

On 01/08/2016 03:02 PM, Paweł Sadowski wrote:

Hi,

Quick results for 1/5/10 jobs:


*snipsnap*

Run status group 0 (all jobs):
   WRITE: io=21116MB, aggrb=360372KB/s, minb=360372KB/s, 
maxb=360372KB/s,

mint=6msec, maxt=6msec


*snipsnap*

Run status group 0 (all jobs):
   WRITE: io=57723MB, aggrb=985119KB/s, minb=985119KB/s, 
maxb=985119KB/s,

mint=60001msec, maxt=60001msec

Disk stats (read/write):
   nvme0n1: ios=0/14754265, merge=0/0, ticks=0/253092, in_queue=254880,
util=100.00%

*snipsnap*


Run status group 0 (all jobs):
   WRITE: io=65679MB, aggrb=1094.7MB/s, minb=1094.7MB/s, 
maxb=1094.7MB/s,

mint=60001msec, maxt=60001msec


*snipsnap*


=== START OF INFORMATION SECTION ===
Vendor:   NVMe
Product:  INTEL SSDPEDMD01
Revision: 8DV1
User Capacity:1,600,321,314,816 bytes [1.60 TB]
Logical block size:   512 bytes
Rotation Rate:Solid State Device
Thank you for the fast answer. The numbers really look promising! Do 
you have experience with the speed of these drives with respect to 
their size? Are the smaller models (e.g. the 400GB one) as fast as the 
larger ones, or does the speed scale with the overall size, e.g. due 
to a larger number of flash chips / memory channels?


Attached are similar runs on a 400GB P3700.  The 400GB is a little 
slower than the 1.6TB but not bad.



Regards,
Burkhard

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Script started on Tue 12 Jan 2016 03:04:55 AM EST
[root@hv01 ~]# fio --filename=/dev/nvme0n1p4 --direct=1 --sync=1 --rw=write 
--bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting 
--name=journal-test
journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
fio-2.2.8
Starting 1 process
Jobs: 1 (f=1)
journal-test: (groupid=0, jobs=1): err= 0: pid=87175: Tue Jan 12 03:05:59 2016
  write: io=23805MB, bw=406279KB/s, iops=101569, runt= 6msec
clat (usec): min=8, max=6156, avg= 9.52, stdev=17.85
 lat (usec): min=8, max=6156, avg= 9.59, stdev=17.85
clat percentiles (usec):
 |  1.00th=[8],  5.00th=[8], 10.00th=[8], 20.00th=[8],
 | 30.00th=[8], 40.00th=[9], 50.00th=[9], 60.00th=[9],
 | 70.00th=[9], 80.00th=[9], 90.00th=[   11], 95.00th=[   18],
 | 99.00th=[   20], 99.50th=[   23], 99.90th=[   29], 99.95th=[   35],
 | 99.99th=[   51]
bw (KB  /s): min=368336, max=419216, per=99.98%, avg=406197.88, 
stdev=11905.08
lat (usec) : 10=86.21%, 20=12.49%, 50=1.29%, 100=0.01%, 250=0.01%
lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%
  cpu  : usr=18.81%, sys=11.95%, ctx=6094194, majf=0, minf=116
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 issued: total=r=0/w=6094190/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
 latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=23805MB, aggrb=406279KB/s, minb=406279KB/s, maxb=406279KB/s, 
mint=6msec, maxt=6msec

Disk stats (read/write):
  nvme0n1: ios=74/6087837, merge=0/0, ticks=5/43645, in_queue=43423, util=71.60%



[root@hv01 ~]# fio --filename=/dev/nvme0n1p4 --direct=1 --sync=1 --rw=write 
--bs=4k --numjobs=5 --iodepth=1 --runtime=60 --time_based --group_reporting 
--name=journal-test

journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
...
fio-2.2.8
Starting 5 processes
Jobs: 5 (f=5)
journal-test: (groupid=0, jobs=5): err= 0: pid=87229: Tue Jan 12 03:07:31 2016
  write: io=54260MB, bw=926023KB/s, iops=231505, runt= 60001msec
clat (usec): min=8, max=12011, avg=20.95, stdev=64.79
 lat (usec): min=8, max=12012, avg=21.06, stdev=64.79
clat percentiles (usec):
 |  1.00th=[9],  5.00th=[   10], 10.00th=[   11], 20.00th=[   12],
 | 30.00th=[   13], 40.00th=[   14], 50.00th=[   16], 60.00th=[   17],
 | 70.00th=[   19], 80.00th=[   23], 90.00th=[   28], 95.00th=[   33],
 | 99.00th=[  108], 99.50th=[  203], 99.90th=[  540], 99.95th=[  684],
 | 99.99th=[ 1272]
bw (KB  /s): min=132048, max=236048, per=20.01%, avg=185337.04, 
stdev=17916.73
lat (usec) : 10=2.84%, 20=67.27%, 50=27.54%, 100=1.28%, 250=0.69%
lat (usec) : 500=0.27%, 750=0.08%, 1000=0.02%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%
  cpu  : usr=5.62%, sys=21.65%, ctx=13890559, majf=0, minf=576
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 issued: total=r

Re: [ceph-users] occasional failure to unmap rbd

2015-09-25 Thread Jeff Epstein

On 09/25/2015 02:28 PM, Jan Schermer wrote:

What about /sys/block/krbdX/holders? Nothing in there?

There is no /sys/block/krbd450, but there is /sys/block/rbd450. In our 
case, /sys/block/rbd450/holders is empty.


Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] occasional failure to unmap rbd

2015-09-25 Thread Jeff Epstein

On 09/25/2015 12:53 PM, Jan Schermer wrote:

What are you looking for in lsof? Did you try looking for the major/minor 
number of the rbd device?
Things that could hold the device are devicemapper, lvm, swraid and possibly 
many more, not sure if all that shows in lsof output...

I searched for the rbd's mounted block device name, of course, which 
didn't turn up anything. Just now I tried searching for the minor device 
number, but I didn't see anything obviously useful. lsof usually just 
shows processes, so if the the device is being held by a kernel module 
or inaccurate refcount, lsof wouldn't help.


Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] occasional failure to unmap rbd

2015-09-25 Thread Jeff Epstein

On 09/25/2015 12:38 PM, Ilya Dryomov wrote:

On Fri, Sep 25, 2015 at 7:17 PM, Jeff Epstein
 wrote:

We occasionally have a situation where we are unable to unmap an rbd. This
occurs intermittently, with no obvious cause. For the most part, rbds can be
unmapped fine, but sometimes we get this:

# rbd unmap /dev/rbd450
rbd: sysfs write failed
rbd: unmap failed: (16) Device or resource busy

Does it persist, i.e. can you unmap a few seconds after this?

It seems to persist. We've been struggling with this for a few days.

Is there any way to determine what exactly is blocking the unmap? Is there a
way to force unmap?

No, there is no way to force unmap.  The most likely reason for -EBUSY
is a positive open_count, meaning something has that device opened at
the time you do unmap.  I guess we could start outputting open_count to
dmesg in these cases, just to be sure.


Is there any way to query the open_count? Or to forcibly reset it if it 
becomes inaccurate?


Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] occasional failure to unmap rbd

2015-09-25 Thread Jeff Epstein
We occasionally have a situation where we are unable to unmap an rbd. 
This occurs intermittently, with no obvious cause. For the most part, 
rbds can be unmapped fine, but sometimes we get this:


# rbd unmap /dev/rbd450
rbd: sysfs write failed
rbd: unmap failed: (16) Device or resource busy

Things we've tried: lsof doesn't provide any useful information. We are 
sure the rbd isn't mapped anywhere. listwatchers shows that there is a 
watcher on the current host, and nowhere else. The given rbd has an 
associated jbd2 process, but no kworker. Other rbds are able to map and 
unmap fine.


I should also mention that for the time being we are using rbd v1 images.

# ceph --version
ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
# uname -srvom
Linux 4.1.6pl #1 SMP Mon Sep 7 22:43:13 CEST 2015 x86_64 GNU/Linux

Is there any way to determine what exactly is blocking the unmap? Is 
there a way to force unmap?


Best,
Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] maximum number of mapped rbds?

2015-09-03 Thread Jeff Epstein

Hello,

In response to an rbd map command, we are getting a "Device or resource 
busy".


$ rbd -p platform map ceph:pzejrbegg54hi-stage-4ac9303161243dc71c75--php

rbd: sysfs write failed

rbd: map failed: (16) Device or resource busy


We currently have over 200 rbds mapped on a single host. Can this be the 
source of the problem? If so, is there a workaround?


$  rbd -p platform showmapped|wc -l
248

Thanks.

Best,
Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] long blocking with writes on rbds

2015-05-06 Thread Jeff Epstein
If anyone here is interested in what became of my problem with 
dreadfully bad performance with ceph, I'd like to offer this follow up.


The problem, as it turns out, is a regression that exists only in 
version 3.18 of the kernel. Upgrading to 4.0 solved the problem, and 
performance is now normal. Odd that no one here suggested this fix, and 
all the messing about with various topologies, placement groups, and so 
on, was for naught.


Jeff

On 04/09/2015 11:25 PM, Jeff Epstein wrote:
As a follow-up to this issue, I'd like to point out some other things 
I've noticed.


First, per suggestions posted here, I've reduced the number of pgs per 
pool. This results in the following ceph status:


cluster e96e10d3-ad2b-467f-9fe4-ab5269b70206
 health HEALTH_WARN too few pgs per osd (14 < min 20)
 monmap e1: 3 mons at 
{a=192.168.224.4:6789/0,b=192.168.232.4:6789/0,c=192.168.240.4:6789/0}, election 
epoch 8, quorum 0,1,2 a,b,c

 osdmap e238: 6 osds: 6 up, 6 in
  pgmap v1107: 86 pgs, 23 pools, 2511 MB data, 801 objects
38288 MB used, 1467 GB / 1504 GB avail
  86 active+clean

I'm not sure if I should be concerned about the HEALTH WARN.

However, this has not helped the performance issues. I've dug deeper 
to try to understand what is actually happening. It's curious because 
there isn't much data: our pools are about 5GB, so it really shouldn't 
take 30 minutes to an hour to run mkfs. Here some results taken from 
disk analysis tools while this delay is in progress:


From pt-diskstats:

  #ts devicerd_s rd_avkb rd_mb_s rd_mrg rd_cnc   rd_rt wr_s 
wr_avkb wr_mb_s wr_mrg wr_cnc   wr_rt busy in_prgio_s qtime stime
  1.0 rbd0   0.0 0.0 0.0 0%0.0 0.0 0.0 
0.0 0.0 0%0.0 0.0 100%  6 0.0 0.0   0.0


From iostat:

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
rbd0  0.00 0.030.030.04 0.13 10.73   
310.78 3.31 19730.410.40 37704.35 7073.59  49.47


These results correspond with my experience: the device is busy, as 
witnessed by the "busy" column in pt-diskstats and the "await" column 
in iostat. But both tools also attest to the fact that there isn't 
much reading or writing going on.  According to pt-diskstats, there 
isn't any. So my question is: what is ceph /doing/? It clearly isn't 
just blocking as a result of excess I/O load, something else is going 
on. Can anyone please explain?


Jeff




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] long blocking with writes on rbds

2015-04-24 Thread Jeff Epstein

Hi JC,

In answer to your question, iostat shows high wait times on the RBD, but 
not on the underlying medium. For example:


Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util

rbd50 0.00 0.010.000.00 0.01 0.0315.81 
5.08 892493.161.47 1692367.78 85088.96  48.41

xvdk  0.0032.180.204.55 1.00   314.30   132.83 
0.40   83.701.10   87.35   0.82   0.39


In this case, rbd50 is the RBD that is blocking, while xvdk is the 
physical disk where OSD data is stored. xvdk appears completely normal, 
whereas rbd50 has absurdly high wait times. This leads me to think that 
the problem is a bug or misconfiguration in ceph, rather than being 
actually blocked by slow I/O.


This information is also reflected in vmstat:

procs ---memory-- ---swap-- -io -system-- 
cpu

 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa

 0  1 1386024  79596 263552 4681361   1126   297  213  247  0  1 57 42


Finally, one can see the blocked processes, and the function they're 
blocked in, with ps. This is pretty typical:


2  6750 root D 0.0 jbd2/rbd47-8wait_on_buffer

2 17551 root D 0.0 jbd2/rbd51-8wait_on_buffer

2 19019 root D 0.0 kworker/u30:3   get_write_access

22369 22374 root D 0.0 shutdownsync_inodes_sb

22372 22381 root D 0.0 shutdownsync_inodes_sb

Frequently mkfs blocks as well:

 1468 12329 root D 0.0 mkfs.ext4   wait_on_page_bit

 1468 12332 root D 0.0 mkfs.ext4   wait_on_buffer


I haven't seen anything obviously unusual in ceph -w, but I'm also not 
completely sure what I'm looking for.


The network connection between our nodes is provided by Amazon AWS, and 
has always been sufficient for our production needs until now. If 
there's a specific issue of concern related to ceph that I should be 
investigating, please let me know.


Here's a pastebin from an OSD experiencing the problem I described. I 
set debug_osd to 5/5. If you can provide any insight, I'd be grateful. 
http://pastebin.com/kLSwbVRb
Also, if you have any more suggestions on how I can collect potentially 
interesting debug info, please let me know. Thanks.


Jeff

On 04/10/2015 12:24 AM, LOPEZ Jean-Charles wrote:

Hi Jeff,

have you tried gathering an iostats but on the OSD side to see how your OSD 
drives behave.

The RBD side shows you what the client is experiencing (the symptom) but will 
not help you find the problem.

Can you grab this iostat output on the OSD VMs (district-1 or district-2) 
depending on which test you did last. Don’t forget to indicate which devices 
are the OSD devices on your VMs together with the iostat posting.

Have you also investigated the network between your client and the OSDs? While 
the test is going, do you see any unusal message in a « ceph -w » output?

Pastebin and we’ll see if we can spot something.

As for the too few PGs, once we’ve found the root cause of why it’s slow, 
you’ll be able to adjust and increase the number of PGs per pool.

Cheers
JC


On 9 Apr 2015, at 20:25, Jeff Epstein  wrote:

As a follow-up to this issue, I'd like to point out some other things I've 
noticed.

First, per suggestions posted here, I've reduced the number of pgs per pool. 
This results in the following ceph status:

 cluster e96e10d3-ad2b-467f-9fe4-ab5269b70206
  health HEALTH_WARN too few pgs per osd (14 < min 20)
  monmap e1: 3 mons at 
{a=192.168.224.4:6789/0,b=192.168.232.4:6789/0,c=192.168.240.4:6789/0}, 
election epoch 8, quorum 0,1,2 a,b,c
  osdmap e238: 6 osds: 6 up, 6 in
   pgmap v1107: 86 pgs, 23 pools, 2511 MB data, 801 objects
 38288 MB used, 1467 GB / 1504 GB avail
   86 active+clean

I'm not sure if I should be concerned about the HEALTH WARN.

However, this has not helped the performance issues. I've dug deeper to try to 
understand what is actually happening. It's curious because there isn't much 
data: our pools are about 5GB, so it really shouldn't take 30 minutes to an 
hour to run mkfs. Here some results taken from disk analysis tools while this 
delay is in progress:

 From pt-diskstats:

   #ts devicerd_s rd_avkb rd_mb_s rd_mrg rd_cnc   rd_rtwr_s wr_avkb 
wr_mb_s wr_mrg wr_cnc   wr_rt busy in_prgio_s  qtime stime
   1.0 rbd0   0.0 0.0 0.0 0%0.0 0.0 0.0 0.0 
0.0 0%0.0 0.0 100%  6 0.00.0   0.0

 From iostat:

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
rbd0  0.00 0.030.030.04 0.1310.73   310.78 
3.31 19730.410.40 37704.35 7073.59  49.47

These results correspond with my experien

Re: [ceph-users] long blocking with writes on rbds

2015-04-23 Thread Jeff Epstein
The appearance of these socket closed messages seems to coincide with 
the slowdown symptoms. What is the cause?


2015-04-23T14:08:47.111838+00:00 i-65062482 kernel: [ 4229.485489] libceph: 
osd1 192.168.160.4:6800 socket closed (con state OPEN)

2015-04-23T14:09:06.961823+00:00 i-65062482 kernel: [ 4249.332547] libceph: 
osd2 192.168.96.4:6800 socket closed (con state OPEN)

2015-04-23T14:09:09.701819+00:00 i-65062482 kernel: [ 4252.070594] libceph: 
osd4 192.168.64.4:6800 socket closed (con state OPEN)

2015-04-23T14:09:10.381817+00:00 i-65062482 kernel: [ 4252.755400] libceph: 
osd5 192.168.128.4:6800 socket closed (con state OPEN)

2015-04-23T14:09:14.831817+00:00 i-65062482 kernel: [ 4257.200257] libceph: 
osd5 192.168.128.4:6800 socket closed (con state OPEN)

2015-04-23T14:13:57.061877+00:00 i-65062482 kernel: [ 4539.431624] libceph: 
osd4 192.168.64.4:6800 socket closed (con state OPEN)

2015-04-23T14:13:57.541842+00:00 i-65062482 kernel: [ 4539.913284] libceph: 
osd5 192.168.128.4:6800 socket closed (con state OPEN)

2015-04-23T14:13:59.801822+00:00 i-65062482 kernel: [ 4542.177187] libceph: 
osd3 192.168.0.4:6800 socket closed (con state OPEN)

2015-04-23T14:14:11.361819+00:00 i-65062482 kernel: [ 4553.733566] libceph: 
osd4 192.168.64.4:6800 socket closed (con state OPEN)

2015-04-23T14:14:47.871829+00:00 i-65062482 kernel: [ 4590.242136] libceph: 
osd5 192.168.128.4:6800 socket closed (con state OPEN)

2015-04-23T14:14:47.991826+00:00 i-65062482 kernel: [ 4590.364078] libceph: 
osd2 192.168.96.4:6800 socket closed (con state OPEN)

2015-04-23T14:15:00.081817+00:00 i-65062482 kernel: [ 4602.452980] libceph: 
osd5 192.168.128.4:6800 socket closed (con state OPEN)

2015-04-23T14:16:21.301820+00:00 i-65062482 kernel: [ 4683.671614] libceph: 
osd5 192.168.128.4:6800 socket closed (con state OPEN)



Jeff

On 04/23/2015 12:26 AM, Jeff Epstein wrote:



Do you have some idea how I can diagnose this problem?


I'll look at ceph -s output while you get these stuck process to see 
if there's any unusual activity (scrub/deep 
scrub/recovery/bacfills/...). Is it correlated in any way with rbd 
removal (ie: write blocking don't appear unless you removed at least 
one rbd for say one hour before the write performance problems).


I'm not familiar with Amazon VMs. If you map the rbds using the 
kernel driver to local block devices do you have control over the 
kernel you run (I've seen reports of various problems with older 
kernels and you probably want the latest possible) ?


ceph status shows nothing unusual. However, on the problematic node, 
we typically see entries in ps like this:


 1468 12329 root D 0.0 mkfs.ext4   wait_on_page_bit
 1468 12332 root D 0.0 mkfs.ext4   wait_on_buffer

Notice the "D" blocking state. Here, mkfs is stopped on some wait 
functions for long periods of time. (Also, we are formatting the RBDs 
as ext4 even though the OSDs are xfs; I assume this shouldn't be a 
problem?)


We're on kernel 3.18.4pl2, which is pretty recent. Still, an outdated 
kernel driver isn't out of the question; if anyone has any concrete 
information, I'd be grateful.


Jeff


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] long blocking with writes on rbds

2015-04-22 Thread Jeff Epstein



Do you have some idea how I can diagnose this problem?


I'll look at ceph -s output while you get these stuck process to see 
if there's any unusual activity (scrub/deep 
scrub/recovery/bacfills/...). Is it correlated in any way with rbd 
removal (ie: write blocking don't appear unless you removed at least 
one rbd for say one hour before the write performance problems).


I'm not familiar with Amazon VMs. If you map the rbds using the kernel 
driver to local block devices do you have control over the kernel you 
run (I've seen reports of various problems with older kernels and you 
probably want the latest possible) ?


ceph status shows nothing unusual. However, on the problematic node, we 
typically see entries in ps like this:


 1468 12329 root D 0.0 mkfs.ext4   wait_on_page_bit
 1468 12332 root D 0.0 mkfs.ext4   wait_on_buffer

Notice the "D" blocking state. Here, mkfs is stopped on some wait 
functions for long periods of time. (Also, we are formatting the RBDs as 
ext4 even though the OSDs are xfs; I assume this shouldn't be a problem?)


We're on kernel 3.18.4pl2, which is pretty recent. Still, an outdated 
kernel driver isn't out of the question; if anyone has any concrete 
information, I'd be grateful.


Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] long blocking with writes on rbds

2015-04-22 Thread Jeff Epstein



On 04/10/2015 10:10 AM, Lionel Bouton wrote:

On 04/10/15 15:41, Jeff Epstein wrote:

[...]
This seems highly unlikely. We get very good performance without 
ceph. Requisitioning and manupulating block devices through LVM 
happens instantaneously. We expect that ceph will be a bit slower by 
its distributed nature, but we've seen operations block for up to an 
hour, which is clearly behind the pale. Furthermore, as the 
performance measure I posted show, read/write speed is not the 
bottleneck: ceph is simply/waiting/.


So, does anyone else have any ideas why mkfs (and other operations) 
takes so long?



As your use case is pretty unique and clearly not something Ceph was 
optimized for, if I were you I'd switch to a single pool with the 
appropriate number of pgs based on your pool size (replication) and 
the number of OSD you use (you should target 100 pgs/OSD to be in what 
seems the sweet spot) and create/delete rbd instead of the whole pool. 
You would be in "known territory" and any remaining performance 
problem would be easier to debug.


I agree that this is a good suggestion. It took me a little while, but 
I've changed the configuration so that we now have only one pool, 
containing many rbds, and now all data is spread across all six of our 
OSD nodes. However, the performance has not perceptibly improved. We 
still have the occasional long (>10 minutes) wait periods during write 
operations, and the bottleneck still seems to be ceph, rather than the 
hardware: the blocking process (most usually, but not always, mkfs) is 
stuck in a wait state ("D" in ps) but no I/O is actually being 
performed, so one can surmise that the physical limitations of the disk 
medium are not the bottleneck. This is similar to what is being reported 
in the thread titled "100% IO Wait with CEPH RBD and RSYNC".


Do you have some idea how I can diagnose this problem?

Best,
Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-22 Thread Jeff Epstein

Hi Christian

This sounds like the same problem we are having. We get long wait times 
on ceph nodes, with certain commands (in our case, mainly mkfs) blocking 
for long periods of time, stuck in a wait (and not read or write) state. 
We get the same warning messages in syslog, as well.


Jeff

On 04/21/2015 04:31 AM, Christian Eichelmann wrote:

Hi Dan,

nope, we have no iptables rules on those hosts and the gateway is on the
same subnet as the ceph cluster.

I will see if I can find some informations on how to debug the rbd
kernel module (any suggestions are appreciated :))

Regards,
Christian

Am 21.04.2015 um 10:20 schrieb Dan van der Ster:

Hi Christian,

I've never debugged the kernel client either, so I don't know how to
increase debugging. (I don't see any useful parms on the kernel
modules).

Your log looks like the client just stops communicating with the ceph
cluster. Is iptables getting in the way ?

Cheers, Dan

On Tue, Apr 21, 2015 at 9:13 AM, Christian Eichelmann
 wrote:

Hi Dan,

we are alreay back on the kernel module since the same problems were
happening with fuse. I had no special ulimit settings for the
fuse-process, so that could have been an issue there.

I was pasting you the kernel messages during such incidents here:
http://pastebin.com/X5JRe1v3

I was never debugging the kernel client. Can you give me a short hint
how to increase the debug level and where the logs will be written to?

Regards,
Christian

Am 20.04.2015 um 15:50 schrieb Dan van der Ster:

Hi,
This is similar to what you would observe if you hit the ulimit on
open files/sockets in a Ceph client. Though that normally only affects
clients in user mode, not the kernel. What are the ulimits of your
rbd-fuse client? Also, you could increase the client logging debug
levels to see why the client is hanging. When the kernel rbd client
was hanging, was there anything printed to dmesg ?
Cheers, Dan

On Mon, Apr 20, 2015 at 9:29 AM, Christian Eichelmann
 wrote:

Hi Ceph-Users!

We currently have a problem where I am not sure if the it has it's cause
in Ceph or something else. First, some information about our ceph-setup:

* ceph version 0.87.1
* 5 MON
* 12 OSD with 60x2TB each
* 2 RSYNC Gateways with 2x10G Ethernet (Kernel: 3.16.3-2~bpo70+1, Debian
Wheezy)

Our cluster is mainly used to store Log-Files from numerous servers via
RSync and make them available via RSync as well. Since about two weeks
we have a very strange behaviour and our RSync Gateways (they just map
several rbd devices and "export" them via rsyncd): The IO Wait on the
systems are increasing untill some of the cores getting stuck with an IO
Wait of 100%. RSync processes become zombies (defunct) and/or can not be
killed even with SIGKILL. After the system has reached a load of about
1400, it becomes totally unresponsive and the only way to "fix" the
problem is to reboot the system.

I was trying to manually reproduce the problem by simultainously reading
and writing from several machine, but the problem didn't appear.

I have no idea where the error can be. I was doing a ceph tell osd.*
bench during the problem and all osds where having normal benchmark
results. Has anyone an idea how this can happen? If you need any more
informations, please let me know.

Regards,
Christian

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Christian Eichelmann
Systemadministrator

1&1 Internet AG - IT Operations Mail & Media Advertising & Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelm...@1und1.de

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] long blocking with writes on rbds

2015-04-09 Thread Jeff Epstein



On 04/09/2015 03:14 AM, Christian Balzer wrote:


Your 6 OSDs are on a single VM from what I gather?
Aside from being a very small number for something that you seem to be
using in some sort of production environment (Ceph gets faster the more
OSDs you add), where is the redundancy, HA in that?


We are running one OSD per VM. All data is replicated across three VMs.


The number of your PGs and PGPs need to have at least a semblance of being
correctly sized, as others mentioned before.
You want to re-read the Ceph docs about that and check out the PG
calculator:
http://ceph.com/pgcalc/


My choice of pgs is based on this page. Since each pool is spread across 
3 OSDs, 100 seemed like a good number. Am I misinterpreting this 
documentation?

http://ceph.com/docs/master/rados/operations/placement-groups/


Since RBDs are sparsely allocated, the actual data used is the key factor.
But you're adding the pool removal overhead to this.

How much overhead does pool removal add?

Both and the fact that you have overloaded the PGs by nearly a factor of
10 (or 20 if you're actually using a replica of 3 and not 1)doesn't help
one bit.
And lets clarify what objects are in the Ceph/RBD context, they're the (by
default) 4MB blobs that make up a RBD image.


I'm curious how you reached your estimation of overloading. According to 
the pg calculator you linked to, given that each pool occupies only 3 
OSDs, the suggested number of pgs is around 100. Can you explain?

- Somewhat off-topic, but for my own curiosity: Why is deleting data so
slow, in terms of ceph's architecture? Shouldn't it just be a matter of
flagging a region as available and allowing it to be overwritten, as
would a traditional file system?


Apples and oranges, as RBD is block storage, not a FS.
That said, a traditional FS is local and updates an inode or equivalent
bit.
For Ceph to delete a RBD image, it has to go to all cluster nodes with
OSDs that have PGs that contain objects of that image. Then those objects
have to be deleted on the local filesystem of the OSD and various maps
updated cluster wide. Rince and repeat until all objects have been dealt
with.
Quite a bit more involved, but that's the price you have to pay when you
have a DISTRIBUTED storage architecture that doesn't rely on a single item
(like an inode) to reflect things for the whole system.

Thank you for explaining.

Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] long blocking with writes on rbds

2015-04-08 Thread Jeff Epstein



Our workload involves creating and destroying a lot of pools. Each pool
has 100 pgs, so it adds up. Could this be causing the problem? What
would you suggest instead?


...this is most likely the cause. Deleting a pool causes the data and
pgs associated with it to be deleted asynchronously, which can be a lot
of background work for the osds.

If you're using the cfq scheduler you can try decreasing the priority 
of these operations with the "osd disk thread ioprio..." options:


http://ceph.com/docs/master/rados/configuration/osd-config-ref/#operations 



If that doesn't help enough, deleting data from pools before deleting
the pools might help, since you can control the rate more finely. And of
course not creating/deleting so many pools would eliminate the hidden
background cost of deleting the pools.


Thanks for your answer. Some follow-up questions:

- I wouldn't expect that pool deletion is the problem, since our pools, 
although many, don't contain much data. Typically, we will have one rbd 
per pool, several GB in size, but in practice containing little data. 
Would you expect that performance penalty from deleting pool to be 
relative to the requested size of the rbd, or relative to the quantity 
of data actually stored in it?


- Rather than creating and deleting multiple pools, each containing a 
single rbd, do you think we would see a speed-up if we were to instead 
have one pool, containing multiple (frequently created and deleted) 
rbds? Does the performance penalty stem only from deleting pools 
themselves, or from deleting objects within the pool as well?


- Somewhat off-topic, but for my own curiosity: Why is deleting data so 
slow, in terms of ceph's architecture? Shouldn't it just be a matter of 
flagging a region as available and allowing it to be overwritten, as 
would a traditional file system?


Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] long blocking with writes on rbds

2015-04-08 Thread Jeff Epstein
Hi, thanks for answering. Here are the answers to your questions. 
Hopefully they will be helpful.


On 04/08/2015 12:36 PM, Lionel Bouton wrote:
I probably won't be able to help much, but people knowing more will 
need at least: - your Ceph version, - the kernel version of the host 
on which you are trying to format /dev/rbd1, - which hardware and 
network you are using for this cluster (CPU, RAM, HDD or SSD models, 
network cards, jumbo frames, ...). 


ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)

Linux 3.18.4pl2 #3 SMP Thu Jan 29 21:11:23 CET 2015 x86_64 GNU/Linux

The hardware is an Amazon AWS c3.large. So, a (virtual) Xeon(R) CPU 
E5-2680 v2 @ 2.80GHz, 3845992 kB RAM, plus whatever other virtual 
hardware Amazon provides.

There's only one thing surprising me here: you have only 6 OSDs, 1504GB
(~ 250G / osd) and a total of 4400 pgs ? With a replication of 3 this is
2200 pgs / OSD, which might be too much and unnecessarily increase the
load on your OSDs.

Best regards,

Lionel Bouton


Our workload involves creating and destroying a lot of pools. Each pool 
has 100 pgs, so it adds up. Could this be causing the problem? What 
would you suggest instead?


Jeff

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] long blocking with writes on rbds

2015-04-08 Thread Jeff Epstein
Hi, I'm having sporadic very poor performance running ceph. Right now 
mkfs, even with nodiscard, takes 30 mintes or more. These kind of delays 
happen often but irregularly .There seems to be no common denominator. 
Clearly, however, they make it impossible to deploy ceph in production.


I reported this problem earlier on ceph's IRC, and was told to add 
nodiscard to mkfs. That didn't help. Here is the command that I'm using 
to format an rbd:


For example: mkfs.ext4 -text4 -m0 -b4096 -E nodiscard /dev/rbd1

Ceph says everything is okay:

cluster e96e10d3-ad2b-467f-9fe4-ab5269b70206
 health HEALTH_OK
 monmap e1: 3 mons at 
{a=192.168.224.4:6789/0,b=192.168.232.4:6789/0,c=192.168.240.4:6789/0}, 
election epoch 12, quorum 0,1,2 a,b,c

 osdmap e972: 6 osds: 6 up, 6 in
  pgmap v4821: 4400 pgs, 44 pools, 5157 MB data, 1654 objects
46138 MB used, 1459 GB / 1504 GB avail
4400 active+clean

And here's my crush map:

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5

# types
type 0 osd
type 1 district
type 2 region

# buckets
district district-1 {
id -1# do not change unnecessarily
# weight 3.000
alg straw
hash 0# rjenkins1
item osd.1 weight 1.000
item osd.2 weight 1.000
item osd.5 weight 1.000
}
district district-2 {
id -2# do not change unnecessarily
# weight 3.000
alg straw
hash 0# rjenkins1
item osd.0 weight 1.000
item osd.3 weight 1.000
item osd.4 weight 1.000
}
region ec2 {
id -3# do not change unnecessarily
# weight 2.000
alg straw
hash 0# rjenkins1
item district-1 weight 1.000
item district-2 weight 1.000
}

# rules
rule rule-district-1 {
ruleset 0
type replicated
min_size 2
max_size 3
step take district-1
step chooseleaf firstn 0 type osd
step emit
}
rule rule-district-2 {
ruleset 1
type replicated
min_size 2
max_size 3
step take district-2
step chooseleaf firstn 0 type osd
step emit
}

# end crush map

Does anyone have any insight into diagnosing this problem?

Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Power failure recovery woes (fwd)

2015-02-20 Thread Jeff
Should I infer from the silence that there is no way to recover from the

"FAILED assert(last_e.version.version < e.version.version)" errors?

Thanks,
Jeff

- Forwarded message from Jeff  -

Date: Tue, 17 Feb 2015 09:16:33 -0500
From: Jeff 
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Power failure recovery woes

Some additional information/questions:

Here is the output of "ceph osd tree"

Some of the "down" OSD's are actually running, but are "down". For example
osd.1:

root 30158  8.6 12.7 1542860 781288 ?  Ssl 07:47   4:40
/usr/bin/ceph-osd --cluster=ceph -i 0 -f

 Is there any way to get the cluster to recognize them as being up?  osd-1 has
the "FAILED assert(last_e.version.version < e.version.version)" errors.

Thanks,
 Jeff


# idweight  type name   up/down reweight
-1  10.22   root default
-2  2.72host ceph1
0   0.91osd.0   up  1
1   0.91osd.1   down0
2   0.9 osd.2   down0
-3  1.82host ceph2
3   0.91osd.3   down0
4   0.91osd.4   down0
-4  2.04host ceph3
5   0.68osd.5   up  1
6   0.68osd.6   up  1
7   0.68osd.7   up  1
8   0.68osd.8   down0
-5  1.82host ceph4
9   0.91osd.9   up  1
10  0.91osd.10  down0
-6  1.82host ceph5
11  0.91osd.11  up  1
12  0.91osd.12  up  1

On 2/17/2015 8:28 AM, Jeff wrote:
> 
> 
>  Original Message 
> Subject: Re: [ceph-users] Power failure recovery woes
> Date: 2015-02-17 04:23
> From: Udo Lembke 
> To: Jeff , ceph-users@lists.ceph.com
> 
> Hi Jeff,
> is the osd /var/lib/ceph/osd/ceph-2 mounted?
> 
> If not, does it helps, if you mounted the osd and start with
> service ceph start osd.2
> ??
> 
> Udo
> 
> Am 17.02.2015 09:54, schrieb Jeff:
>> Hi,
>> 
>> We had a nasty power failure yesterday and even with UPS's our small (5
>> node, 12 OSD) cluster is having problems recovering.
>> 
>> We are running ceph 0.87
>> 
>> 3 of our OSD's are down consistently (others stop and are restartable,
>> but our cluster is so slow that almost everything we do times out).
>> 
>> We are seeing errors like this on the OSD's that never run:
>> 
>> ERROR: error converting store /var/lib/ceph/osd/ceph-2: (1)
>> Operation not permitted
>> 
>> We are seeing errors like these of the OSD's that run some of the time:
>> 
>> osd/PGLog.cc: 844: FAILED assert(last_e.version.version <
>> e.version.version)
>> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide
>> timeout")
>> 
>> Does anyone have any suggestions on how to recover our cluster?
>> 
>> Thanks!
>>   Jeff
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

- End forwarded message -

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Power failure recovery woes

2015-02-17 Thread Jeff

Some additional information/questions:

Here is the output of "ceph osd tree"

Some of the "down" OSD's are actually running, but are "down". For 
example osd.1:


root 30158  8.6 12.7 1542860 781288 ?  Ssl 07:47   4:40 
/usr/bin/ceph-osd --cluster=ceph -i 0 -f


 Is there any way to get the cluster to recognize them as being up?  
osd-1 has the "FAILED assert(last_e.version.version < 
e.version.version)" errors.


Thanks,
 Jeff


# idweight  type name   up/down reweight
-1  10.22   root default
-2  2.72host ceph1
0   0.91osd.0   up  1
1   0.91osd.1   down0
2   0.9 osd.2   down0
-3  1.82host ceph2
3   0.91osd.3   down0
4   0.91osd.4   down0
-4  2.04host ceph3
5   0.68osd.5   up  1
6   0.68osd.6   up  1
7   0.68osd.7   up  1
8   0.68osd.8   down0
-5  1.82host ceph4
9   0.91osd.9   up  1
10  0.91osd.10  down0
-6  1.82host ceph5
11  0.91osd.11  up  1
12  0.91    osd.12  up  1

On 2/17/2015 8:28 AM, Jeff wrote:



 Original Message 
Subject: Re: [ceph-users] Power failure recovery woes
Date: 2015-02-17 04:23
From: Udo Lembke 
To: Jeff , ceph-users@lists.ceph.com

Hi Jeff,
is the osd /var/lib/ceph/osd/ceph-2 mounted?

If not, does it helps, if you mounted the osd and start with
service ceph start osd.2
??

Udo

Am 17.02.2015 09:54, schrieb Jeff:

Hi,

We had a nasty power failure yesterday and even with UPS's our small (5
node, 12 OSD) cluster is having problems recovering.

We are running ceph 0.87

3 of our OSD's are down consistently (others stop and are restartable,
but our cluster is so slow that almost everything we do times out).

We are seeing errors like this on the OSD's that never run:

ERROR: error converting store /var/lib/ceph/osd/ceph-2: (1)
Operation not permitted

We are seeing errors like these of the OSD's that run some of the time:

osd/PGLog.cc: 844: FAILED assert(last_e.version.version <
e.version.version)
common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide 
timeout")


Does anyone have any suggestions on how to recover our cluster?

Thanks!
  Jeff


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Power failure recovery woes

2015-02-17 Thread Jeff

Udo,

Yes, the osd is mounted:  /dev/sda4  963605972 260295676 703310296  
28% /var/lib/ceph/osd/ceph-2


Thanks,
Jeff

 Original Message 
Subject: Re: [ceph-users] Power failure recovery woes
Date: 2015-02-17 04:23
From: Udo Lembke 
To: Jeff , ceph-users@lists.ceph.com

Hi Jeff,
is the osd /var/lib/ceph/osd/ceph-2 mounted?

If not, does it helps, if you mounted the osd and start with
service ceph start osd.2
??

Udo

Am 17.02.2015 09:54, schrieb Jeff:

Hi,

We had a nasty power failure yesterday and even with UPS's our small (5
node, 12 OSD) cluster is having problems recovering.

We are running ceph 0.87

3 of our OSD's are down consistently (others stop and are restartable,
but our cluster is so slow that almost everything we do times out).

We are seeing errors like this on the OSD's that never run:

ERROR: error converting store /var/lib/ceph/osd/ceph-2: (1)
Operation not permitted

We are seeing errors like these of the OSD's that run some of the time:

osd/PGLog.cc: 844: FAILED assert(last_e.version.version <
e.version.version)
common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide
timeout")

Does anyone have any suggestions on how to recover our cluster?

Thanks!
  Jeff


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Power failure recovery woes

2015-02-17 Thread Jeff

Hi,

We had a nasty power failure yesterday and even with UPS's our small (5 
node, 12 OSD) cluster is having problems recovering.


We are running ceph 0.87

3 of our OSD's are down consistently (others stop and are restartable, 
but our cluster is so slow that almost everything we do times out).


We are seeing errors like this on the OSD's that never run:

ERROR: error converting store /var/lib/ceph/osd/ceph-2: (1) 
Operation not permitted


We are seeing errors like these of the OSD's that run some of the time:

osd/PGLog.cc: 844: FAILED assert(last_e.version.version < 
e.version.version)

common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")

Does anyone have any suggestions on how to recover our cluster?

Thanks!
  Jeff


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mon problem after power failure

2015-01-10 Thread Jeff
Thanks - ceph health is now reporting HEALTH_OK :-)

On Sat, Jan 10, 2015 at 02:55:01AM +, Joao Eduardo Luis wrote:
> On 01/09/2015 04:31 PM, Jeff wrote:
> >We had a power failure last night and our five node cluster has
> >two nodes with mon's that fail to start.  Here's what we see:
> >
> >#  /usr/bin/ceph-mon --cluster=ceph -i ceph2 -f
> >2015-01-09 11:28:45.579267 b6c10740 -1 ERROR: on disk data includes 
> >unsupported features: compat={},rocompat={},incompat={6=support isa/lrc 
> >erasure code}
> >2015-01-09 11:28:45.606896 b6c10740 -1 error checking features: (1) 
> >Operation not permitted
> >
> >and
> >
> ># /usr/local/bin/ceph-mon --cluster=ceph -i ceph4 -f
> >Corruption: 6 missing files; e.g.: 
> >/var/lib/ceph/mon/ceph-ceph4/store.db/4011258.ldb
> >Corruption: 6 missing files; e.g.: 
> >/var/lib/ceph/mon/ceph-ceph4/store.db/4011258.ldb
> >2015-01-09 11:30:32.024445 b6ea1740 -1 failed to create new leveldb store
> >
> >Does anyone have any suggestions for how to get these two monitors running
> >again?
> 
> Recreate them.  Only way I'm aware especially considering the leveldb
> corruption.
> 
>   -Joao
> 
> 
> -- 
> Joao Eduardo Luis
> Software Engineer | http://ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mon problem after power failure

2015-01-09 Thread Jeff
We had a power failure last night and our five node cluster has
two nodes with mon's that fail to start.  Here's what we see:

#  /usr/bin/ceph-mon --cluster=ceph -i ceph2 -f
2015-01-09 11:28:45.579267 b6c10740 -1 ERROR: on disk data includes unsupported 
features: compat={},rocompat={},incompat={6=support isa/lrc erasure code}
2015-01-09 11:28:45.606896 b6c10740 -1 error checking features: (1) Operation 
not permitted

and

# /usr/local/bin/ceph-mon --cluster=ceph -i ceph4 -f
Corruption: 6 missing files; e.g.: 
/var/lib/ceph/mon/ceph-ceph4/store.db/4011258.ldb
Corruption: 6 missing files; e.g.: 
/var/lib/ceph/mon/ceph-ceph4/store.db/4011258.ldb
2015-01-09 11:30:32.024445 b6ea1740 -1 failed to create new leveldb store

Does anyone have any suggestions for how to get these two monitors running
again?

Thanks!
    Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow requests/blocked

2014-11-20 Thread Jeff
Thanks.  I should have mentioned that the errors are pretty well
distributed across the cluster:

ceph1: /var/log/ceph/ceph-osd.0.log   71
ceph1: /var/log/ceph/ceph-osd.1.log  112
ceph1: /var/log/ceph/ceph-osd.2.log   38
ceph2: /var/log/ceph/ceph-osd.3.log   88
ceph2: /var/log/ceph/ceph-osd.4.log   54
ceph3: /var/log/ceph/ceph-osd.5.log   36
ceph3: /var/log/ceph/ceph-osd.6.log   48
ceph3: /var/log/ceph/ceph-osd.7.log   39
ceph3: /var/log/ceph/ceph-osd.8.log   40
ceph4: /var/log/ceph/ceph-osd.10.log  95
ceph4: /var/log/ceph/ceph-osd.9.log  139
ceph5: /var/log/ceph/ceph-osd.11.log  81
ceph5: /var/log/ceph/ceph-osd.12.log 393

I'll try to catch them while they're happening and see what I can
learn.

Thanks again!!

Jeff


On Thu, Nov 20, 2014 at 06:40:57AM -0800, Jean-Charles LOPEZ wrote:
> Hi Jeff,
> 
> it would probably wise to first check what these slow requests are:
> 1) ceph health detail -> This will tell you which OSDs are experiencing the 
> slow requests
> 2) ceph daemon osd.{id} dump_ops_in_flight -> To be issued on one of the 
> above OSDs will tell you what theses ops are waiting for.
> 
> My fair guess is that either you have a network problem or some other drives 
> in your cluster are about to die or are experiencing write errors causing 
> retries and slowing the request processing.
> 
> Just to be sure, if your drives are SMART capable, use smartctl to look ate 
> the stats for the drives you will have potentially identified in the steps 
> above.
> 
> Regards
> JC
> 
> 
> 
> > On Nov 20, 2014, at 06:00, Jeff  wrote:
> > 
> > Hi,
> > 
> > We have a five node cluster that has been running for a long
> > time (over a year).  A few weeks ago we upgraded to 0.87 (giant) and 
> > things continued to work well.  
> > 
> > Last week a drive failed on one of the nodes.  We replaced the
> > drive and things were working well again.
> > 
> > After about six days we started getting lots of "slow
> > requests...blocked for..." messages (100's/hour) and performance has been
> > terrible.  Since then we've made sure to have all of the latest OS patches
> > and rebooted all five nodes.  We are still seeing a lot of slow
> > requests/blocked messages.  Any idea(s) on what's wrong/where to look?
> > 
> > Thanks!
> > Jeff
> > -- 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
===
Jeff's Used Movie Finder
 http://www.usedmoviefinder.com
email: j...@usedmoviefinder.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] slow requests/blocked

2014-11-20 Thread Jeff
Hi,

We have a five node cluster that has been running for a long
time (over a year).  A few weeks ago we upgraded to 0.87 (giant) and 
things continued to work well.  

Last week a drive failed on one of the nodes.  We replaced the
drive and things were working well again.

After about six days we started getting lots of "slow
requests...blocked for..." messages (100's/hour) and performance has been
terrible.  Since then we've made sure to have all of the latest OS patches
and rebooted all five nodes.  We are still seeing a lot of slow
requests/blocked messages.  Any idea(s) on what's wrong/where to look?

Thanks!
Jeff
-- 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] the state of cephfs in giant

2014-10-13 Thread Jeff Bailey

On 10/13/2014 4:56 PM, Sage Weil wrote:

On Mon, 13 Oct 2014, Eric Eastman wrote:

I would be interested in testing the Samba VFS and Ganesha NFS integration
with CephFS.  Are there any notes on how to configure these two interfaces
with CephFS?


For ganesha I'm doing something like:

FSAL
{
  CEPH
  {
FSAL_Shared_Library = "libfsalceph.so";
  }
}

EXPORT
{
  Export_Id = 1;
  Path = "131.123.35.53:/";
  Pseudo = "/ceph";
  Tag = "ceph";
  FSAL
  {
Name = "CEPH";
  }
}



For samba, based on
https://github.com/ceph/ceph-qa-suite/blob/master/tasks/samba.py#L106
I think you need something like

[myshare]
path = /
writeable = yes
vfs objects = ceph
ceph:config_file = /etc/ceph/ceph.conf

Not sure what the ganesha config looks like.  Matt and the other folks at
cohortfs would know more.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Problem with ceph_filestore_dump, possibly stuck in a loop

2014-05-16 Thread Jeff Bachtel
Overnight, I tried to use ceph_filestore_dump to export a pg that is 
missing from other osds from an osd, with the intent of manually copying 
the export to the osds in the pg map and importing.


Unfortunately, what is on-disk 59gb of data had filled 1TB when I got in 
this morning, and still hadn't completed. Is it possible for a loop to 
develop in a ceph_filestore_dump export?


My C++ isn't the best. I can see in ceph_filestore_dump.cc int 
export_files a loop could occur if a broken collection was read, 
possibly. Maybe.


--debug output seems to confirm?

grep '^read' /tmp/ceph_filestore_dump.out  | sort | wc -l ; grep '^read' 
/tmp/ceph_filestore_dump.out  | sort | uniq | wc -l

2714
258

(only 258 unique reads are being reported, but each repeated > 10 times 
so far)


From start of debug output

Supported features: compat={},rocompat={},incompat={1=initial feature 
set(~v.18),2=pginfo object,3=object 
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper,11=sharded 
objects}
On-disk features: compat={},rocompat={},incompat={1=initial feature 
set(~v.18),2=pginfo object,3=object 
locator,4=last_epoch_clean,5=categories,6=hobjectpool,7=biginfo,8=leveldbinfo,9=leveldblog,10=snapmapper}

Exporting 0.2f
read 8210002f/100d228.00019150/head//0
size=4194304
data section offset=1048576 len=1048576
data section offset=2097152 len=1048576
data section offset=3145728 len=1048576
data section offset=4194304 len=1048576
attrs size 2

then at line 1810
ead 8210002f/100d228.00019150/head//0
size=4194304
data section offset=1048576 len=1048576
data section offset=2097152 len=1048576
data section offset=3145728 len=1048576
data section offset=4194304 len=1048576
attrs size 2


If this is a loop due to a broken filestore, is there any recourse on 
repairing it? The osd I'm trying to dump from isn't in the pg map for 
the cluster, I'm trying to save some data by exporting this version of 
the pg and importing it on an osd that's mapped. If I'm failing at a 
basic premise even trying to do that, please let me know so I can wave 
off (in which case, I believe I'd use ceph_filestore_dump to delete all 
copies of this pg in the cluster so I can force create it, which is 
failing at this time).


Thanks,

Jeff

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.1 Firefly released

2014-05-12 Thread Jeff Bachtel
I see the EL6 build on http://ceph.com/rpm-firefly/el6/x86_64/ but not 
on gitbuilder (last build 07MAY). Is 0.80.1 considered a different 
branch ref for purposes of gitbuilder?


Jeff

On 05/12/2014 05:31 PM, Sage Weil wrote:

This first Firefly point release fixes a few bugs, the most visible
being a problem that prevents scrub from completing in some cases.

Notable Changes
---

* osd: revert incomplete scrub fix (Samuel Just)
* rgw: fix stripe calculation for manifest objects (Yehuda Sadeh)
* rgw: improve handling, memory usage for abort reads (Yehuda Sadeh)
* rgw: send Swift user manifest HTTP header (Yehuda Sadeh)
* libcephfs, ceph-fuse: expose MDS session state via admin socket (Yan,
   Zheng)
* osd: add simple throttle for snap trimming (Sage Weil)
* monclient: fix possible hang from ill-timed monitor connection failure
   (Sage Weil)
* osd: fix trimming of past HitSets (Sage Weil)
* osd: fix whiteouts for non-writeback cache modes (Sage Weil)
* osd: prevent divide by zero in tiering agent (David Zafman)
* osd: prevent busy loop when tiering agent can do no work (David Zafman)

For more detailed information, see the complete changelog:

* http://ceph.com/docs/master/_downloads/v0.80.1.txt

Getting Ceph


* Git at git://github.com/ceph/ceph.git
* Tarball at http://ceph.com/download/ceph-0.80.1.tar.gz
* For packages, see http://ceph.com/docs/master/install/get-packages
* For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs not mapped to osds, tearing hair out

2014-05-09 Thread Jeff Bachtel
Wow I'm an idiot for getting the wrong reweight command.

Thanks so much,

Jeff
On May 9, 2014 11:06 AM, "Sage Weil"  wrote:

> On Fri, 9 May 2014, Jeff Bachtel wrote:
> > I'm working on http://tracker.ceph.com/issues/8310 , basically by
> bringing
> > osds down and up I've come to a state where on-disk I have pgs, osds
> seem to
> > scan the directories on boot, but the crush map isn't mapping the objects
> > properly.
> >
> > In addition to that ticket, I've got a decompile of my crushmap at
> > https://github.com/jeffb-bt/nova_confs/blob/master/crushmap.txt and a
> raw dump
> > (in case anything is missing) at
> > https://github.com/jeffb-bt/nova_confs/blob/master/crushmap
> >
> > # ceph osd tree
> > # idweight  type name   up/down reweight
> > -1  5   root default
> > -2  1   host compute1
> > 0   1   osd.0   up  0.24
> > -3  1   host compute2
> > 1   1   osd.1   up  0.26
> > -4  3   host compute3
> > 3   1   osd.3   up  0.02325
> > 2   1   osd.2   up  0.01993
> > 5   1   osd.5   up  0.15
>
> The weights on the right should all be set to 1:
>
> ceph osd reweight $OSD 1
> or
> ceph osd in $OSD
>
> Those weights are used for exceptional cases.  In general they should be
> 1 ("in") or 0 ("out"), unless you are making small corrections in the
> placement.
>
> To adjust the relative weights on the disks, you want to adjust the CRUSH
> weights on the left (second column):
>
> ceph osd crush reweight $OSD $WEIGHT
>
> sage
>
>
> >
> > Does anyone see where the error is in the crushmap that I can fix it? I
> don't
> > have very many pgs/pools, if I need to add extra mapping to the crushmap
> to
> > get my pgs visible I will do so, I just don't have any examples of how.
> >
> > Thanks for any help,
> >
> > Jeff
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pgs not mapped to osds, tearing hair out

2014-05-09 Thread Jeff Bachtel
I'm working on http://tracker.ceph.com/issues/8310 , basically by 
bringing osds down and up I've come to a state where on-disk I have pgs, 
osds seem to scan the directories on boot, but the crush map isn't 
mapping the objects properly.


In addition to that ticket, I've got a decompile of my crushmap at 
https://github.com/jeffb-bt/nova_confs/blob/master/crushmap.txt and a 
raw dump (in case anything is missing) at 
https://github.com/jeffb-bt/nova_confs/blob/master/crushmap


# ceph osd tree
# idweight  type name   up/down reweight
-1  5   root default
-2  1   host compute1
0   1   osd.0   up  0.24
-3  1   host compute2
1   1   osd.1   up  0.26
-4  3   host compute3
3   1   osd.3   up  0.02325
2   1   osd.2   up  0.01993
5   1   osd.5   up  0.15

Does anyone see where the error is in the crushmap that I can fix it? I 
don't have very many pgs/pools, if I need to add extra mapping to the 
crushmap to get my pgs visible I will do so, I just don't have any 
examples of how.


Thanks for any help,

Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Manually mucked up pg, need help fixing

2014-05-05 Thread Jeff Bachtel
noout was set while I manhandled osd.4 in and out of the cluster 
repeatedly (trying to set copy data from other osds and set attr to make 
osd.4 pick up that it had objects in pg 0.2f). It wasn't set before the 
problem, and isn't set currently.


I don't really know where you saw pool size = 1:

# for p in $(ceph osd lspools | awk 'BEGIN { RS="," } { print $2 }'); do 
ceph osd pool get $p size;  done

size: 2
size: 2
size: 2
size: 2
size: 2
size: 2
size: 2
size: 2
size: 2
size: 2
size: 2
size: 2
size: 2
size: 2

All pools are reporting size 2. The osd that last shared the incomplete 
pg (osd.1) had the pg directory intact and appropriately sized. However, 
it seems the pgmap was preferring osd.4 as the most recent copy of that 
pg, even when the pg directory was deleted. I guess because the pg was 
flagged incomplete, there was no further attempt to mirror the bogus pg 
onto another osd.


Since I sent my original email (this afternoon actually), I've nuked 
osd.4 and created an osd.5 on its old disc. I've still got pg 0.2f 
listed as down/incomplete/inactive despite marking its only home osd as 
lost. I'll follow up tomorrow after object recovery is as complete as 
it's going to get.


At this point though I'm shrugging and accepting the data loss, but 
ideas on how to create a new pg to replace the incomplete 0.2f would be 
deeply useful. I'm supposing ceph pg force_create_pg 0.2f would suffice.


Jeff

On 05/05/2014 07:46 PM, Gregory Farnum wrote:

Oh, you've got no-out set. Did you lose an OSD at any point? Are you
really running the system with pool size 1? I think you've managed to
erase the up-to-date data, but not the records of that data's
existence. You'll have to explore the various "lost" commands, but I'm
not sure what the right approach is here. It's possible you're just
out of luck after manually adjusting the store improperly.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Mon, May 5, 2014 at 4:39 PM, Jeff Bachtel
 wrote:

Thanks. That is a cool utility, unfortunately I'm pretty sure the pg in
question had a cephfs object instead of rbd images (because mounting cephfs
is the only noticeable brokenness).

Jeff


On 05/05/2014 06:43 PM, Jake Young wrote:

I was in a similar situation where I could see the PGs data on an osd, but
there was nothing I could do to force the pg to use that osd's copy.

I ended up using the rbd_restore tool to create my rbd on disk and then I
reimported it into the pool.

See this thread for info on rbd_restore:
http://www.spinics.net/lists/ceph-devel/msg11552.html

Of course, you have to copy all of the pieces of the rbd image on one file
system somewhere (thank goodness for thin provisioning!) for the tool to
work.

There really should be a better way.

Jake

On Monday, May 5, 2014, Jeff Bachtel 
wrote:

Well, that'd be the ideal solution. Please check out the github gist I
posted, though. It seems that despite osd.4 having nothing good for pg 0.2f,
the cluster does not acknowledge any other osd has a copy of the pg. I've
tried downing osd.4 and manually deleting the pg directory in question with
the hope that the cluster would roll back epochs for 0.2f, but all it does
is recreate the pg directory (empty) on osd.4.

Jeff

On 05/05/2014 04:33 PM, Gregory Farnum wrote:

What's your cluster look like? I wonder if you can just remove the bad
PG from osd.4 and let it recover from the existing osd.1
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Sat, May 3, 2014 at 9:17 AM, Jeff Bachtel
 wrote:

This is all on firefly rc1 on CentOS 6

I had an osd getting overfull, and misinterpreting directions I downed
it
then manually removed pg directories from the osd mount. On restart and
after a good deal of rebalancing (setting osd weights as I should've
originally), I'm now at

  cluster de10594a-0737-4f34-a926-58dc9254f95f
   health HEALTH_WARN 2 pgs backfill; 1 pgs incomplete; 1 pgs stuck
inactive; 308 pgs stuck unclean; recov
ery 1/2420563 objects degraded (0.000%); noout flag(s) set
   monmap e7: 3 mons at

{controller1=10.100.2.1:6789/0,controller2=10.100.2.2:6789/0,controller3=10.100.2.
3:6789/0}, election epoch 556, quorum 0,1,2
controller1,controller2,controller3
   mdsmap e268: 1/1/1 up {0=controller1=up:active}
   osdmap e3492: 5 osds: 5 up, 5 in
  flags noout
pgmap v4167420: 320 pgs, 15 pools, 4811 GB data, 1181 kobjects
  9770 GB used, 5884 GB / 15654 GB avail
  1/2420563 objects degraded (0.000%)
 3 active
12 active+clean
 2 active+remapped+wait_backfill
 1 incomplete
   302 active+remapped
client io 364 B/s wr, 0 op/s

# ceph pg dump | grep 0.2f
dumped all in format plain
0.2f0   0   0   

Re: [ceph-users] Manually mucked up pg, need help fixing

2014-05-05 Thread Jeff Bachtel
Thanks. That is a cool utility, unfortunately I'm pretty sure the pg in 
question had a cephfs object instead of rbd images (because mounting 
cephfs is the only noticeable brokenness).


Jeff

On 05/05/2014 06:43 PM, Jake Young wrote:
I was in a similar situation where I could see the PGs data on an osd, 
but there was nothing I could do to force the pg to use that osd's copy.


I ended up using the rbd_restore tool to create my rbd on disk and 
then I reimported it into the pool.


See this thread for info on rbd_restore:
http://www.spinics.net/lists/ceph-devel/msg11552.html

Of course, you have to copy all of the pieces of the rbd image on one 
file system somewhere (thank goodness for thin provisioning!) for the 
tool to work.


There really should be a better way.

Jake

On Monday, May 5, 2014, Jeff Bachtel <mailto:jbach...@bericotechnologies.com>> wrote:


Well, that'd be the ideal solution. Please check out the github
gist I posted, though. It seems that despite osd.4 having nothing
good for pg 0.2f, the cluster does not acknowledge any other osd
has a copy of the pg. I've tried downing osd.4 and manually
deleting the pg directory in question with the hope that the
cluster would roll back epochs for 0.2f, but all it does is
recreate the pg directory (empty) on osd.4.

Jeff

On 05/05/2014 04:33 PM, Gregory Farnum wrote:

What's your cluster look like? I wonder if you can just remove
the bad
PG from osd.4 and let it recover from the existing osd.1
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


    On Sat, May 3, 2014 at 9:17 AM, Jeff Bachtel
 wrote:

This is all on firefly rc1 on CentOS 6

I had an osd getting overfull, and misinterpreting
directions I downed it
then manually removed pg directories from the osd mount.
On restart and
after a good deal of rebalancing (setting osd weights as I
should've
originally), I'm now at

 cluster de10594a-0737-4f34-a926-58dc9254f95f
  health HEALTH_WARN 2 pgs backfill; 1 pgs incomplete;
1 pgs stuck
inactive; 308 pgs stuck unclean; recov
ery 1/2420563 objects degraded (0.000%); noout flag(s) set
  monmap e7: 3 mons at

{controller1=10.100.2.1:6789/0,controller2=10.100.2.2:6789/0,controller3=10.100.2

<http://10.100.2.1:6789/0,controller2=10.100.2.2:6789/0,controller3=10.100.2>.
3:6789/0}, election epoch 556, quorum 0,1,2
controller1,controller2,controller3
  mdsmap e268: 1/1/1 up {0=controller1=up:active}
  osdmap e3492: 5 osds: 5 up, 5 in
 flags noout
   pgmap v4167420: 320 pgs, 15 pools, 4811 GB data,
1181 kobjects
 9770 GB used, 5884 GB / 15654 GB avail
 1/2420563 objects degraded (0.000%)
3 active
   12 active+clean
2 active+remapped+wait_backfill
1 incomplete
  302 active+remapped
   client io 364 B/s wr, 0 op/s

# ceph pg dump | grep 0.2f
dumped all in format plain
0.2f0   0   0   0   0   0   0
incomplete
2014-05-03 11:38:01.526832 0'0  3492:23 [4] 4   [4] 4
2254'20053  2014-04-28 00:24:36.504086  2100'18109
2014-04-26
22:26:23.699330

# ceph pg map 0.2f
osdmap e3492 pg 0.2f (0.2f) -> up [4] acting [4]

The pg query for the downed pg is at
https://gist.github.com/jeffb-bt/c8730899ff002070b325

Of course, the osd I manually mucked with is the only one
the cluster is
picking up as up/acting. Now, I can query the pg and find
epochs where other
osds (that I didn't jack up) were acting. And in fact, the
latest of those
entries (osd.1) has the pg directory in its osd mount, and
it's a good
healthy 59gb.

I've tried manually rsync'ing (and preserving attributes)
that set of
directories from osd.1 to osd.4 without success. Likewise
I've tried copying
the directories over without attributes set. I've done
many, many deep
scrubs but the pg query does not show the scrub timestamps
being affected.

I'm seeking ideas for either fixing metadata on the
directory on osd.4 to
cause this pg to be seen/recognized, or ideas on 

Re: [ceph-users] Manually mucked up pg, need help fixing

2014-05-05 Thread Jeff Bachtel
Well, that'd be the ideal solution. Please check out the github gist I 
posted, though. It seems that despite osd.4 having nothing good for pg 
0.2f, the cluster does not acknowledge any other osd has a copy of the 
pg. I've tried downing osd.4 and manually deleting the pg directory in 
question with the hope that the cluster would roll back epochs for 0.2f, 
but all it does is recreate the pg directory (empty) on osd.4.


Jeff

On 05/05/2014 04:33 PM, Gregory Farnum wrote:

What's your cluster look like? I wonder if you can just remove the bad
PG from osd.4 and let it recover from the existing osd.1
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Sat, May 3, 2014 at 9:17 AM, Jeff Bachtel
 wrote:

This is all on firefly rc1 on CentOS 6

I had an osd getting overfull, and misinterpreting directions I downed it
then manually removed pg directories from the osd mount. On restart and
after a good deal of rebalancing (setting osd weights as I should've
originally), I'm now at

 cluster de10594a-0737-4f34-a926-58dc9254f95f
  health HEALTH_WARN 2 pgs backfill; 1 pgs incomplete; 1 pgs stuck
inactive; 308 pgs stuck unclean; recov
ery 1/2420563 objects degraded (0.000%); noout flag(s) set
  monmap e7: 3 mons at
{controller1=10.100.2.1:6789/0,controller2=10.100.2.2:6789/0,controller3=10.100.2.
3:6789/0}, election epoch 556, quorum 0,1,2
controller1,controller2,controller3
  mdsmap e268: 1/1/1 up {0=controller1=up:active}
  osdmap e3492: 5 osds: 5 up, 5 in
 flags noout
   pgmap v4167420: 320 pgs, 15 pools, 4811 GB data, 1181 kobjects
 9770 GB used, 5884 GB / 15654 GB avail
 1/2420563 objects degraded (0.000%)
3 active
   12 active+clean
2 active+remapped+wait_backfill
1 incomplete
  302 active+remapped
   client io 364 B/s wr, 0 op/s

# ceph pg dump | grep 0.2f
dumped all in format plain
0.2f0   0   0   0   0   0   0 incomplete
2014-05-03 11:38:01.526832 0'0  3492:23 [4] 4   [4] 4
2254'20053  2014-04-28 00:24:36.504086  2100'18109 2014-04-26
22:26:23.699330

# ceph pg map 0.2f
osdmap e3492 pg 0.2f (0.2f) -> up [4] acting [4]

The pg query for the downed pg is at
https://gist.github.com/jeffb-bt/c8730899ff002070b325

Of course, the osd I manually mucked with is the only one the cluster is
picking up as up/acting. Now, I can query the pg and find epochs where other
osds (that I didn't jack up) were acting. And in fact, the latest of those
entries (osd.1) has the pg directory in its osd mount, and it's a good
healthy 59gb.

I've tried manually rsync'ing (and preserving attributes) that set of
directories from osd.1 to osd.4 without success. Likewise I've tried copying
the directories over without attributes set. I've done many, many deep
scrubs but the pg query does not show the scrub timestamps being affected.

I'm seeking ideas for either fixing metadata on the directory on osd.4 to
cause this pg to be seen/recognized, or ideas on forcing the cluster's pg
map to point to osd.1 for the incomplete pg (basically wiping out the
cluster's memory that osd.4 ever had 0.2f). Or any other solution :) It's
only 59g, so worst case I'll mark it lost and recreate the pg, but I'd
prefer to learn enough of the innards to understand what is going on, and
possible means of fixing it.

Thanks for any help,

Jeff

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Manually mucked up pg, need help fixing

2014-05-03 Thread Jeff Bachtel

This is all on firefly rc1 on CentOS 6

I had an osd getting overfull, and misinterpreting directions I downed 
it then manually removed pg directories from the osd mount. On restart 
and after a good deal of rebalancing (setting osd weights as I should've 
originally), I'm now at


cluster de10594a-0737-4f34-a926-58dc9254f95f
 health HEALTH_WARN 2 pgs backfill; 1 pgs incomplete; 1 pgs stuck 
inactive; 308 pgs stuck unclean; recov

ery 1/2420563 objects degraded (0.000%); noout flag(s) set
 monmap e7: 3 mons at 
{controller1=10.100.2.1:6789/0,controller2=10.100.2.2:6789/0,controller3=10.100.2.
3:6789/0}, election epoch 556, quorum 0,1,2 
controller1,controller2,controller3

 mdsmap e268: 1/1/1 up {0=controller1=up:active}
 osdmap e3492: 5 osds: 5 up, 5 in
flags noout
  pgmap v4167420: 320 pgs, 15 pools, 4811 GB data, 1181 kobjects
9770 GB used, 5884 GB / 15654 GB avail
1/2420563 objects degraded (0.000%)
   3 active
  12 active+clean
   2 active+remapped+wait_backfill
   1 incomplete
 302 active+remapped
  client io 364 B/s wr, 0 op/s

# ceph pg dump | grep 0.2f
dumped all in format plain
0.2f0   0   0   0   0   0   0 
incomplete  2014-05-03 11:38:01.526832 0'0  3492:23 [4] 4   
[4] 4   2254'20053  2014-04-28 00:24:36.504086  
2100'18109 2014-04-26 22:26:23.699330


# ceph pg map 0.2f
osdmap e3492 pg 0.2f (0.2f) -> up [4] acting [4]

The pg query for the downed pg is at 
https://gist.github.com/jeffb-bt/c8730899ff002070b325


Of course, the osd I manually mucked with is the only one the cluster is 
picking up as up/acting. Now, I can query the pg and find epochs where 
other osds (that I didn't jack up) were acting. And in fact, the latest 
of those entries (osd.1) has the pg directory in its osd mount, and it's 
a good healthy 59gb.


I've tried manually rsync'ing (and preserving attributes) that set of 
directories from osd.1 to osd.4 without success. Likewise I've tried 
copying the directories over without attributes set. I've done many, 
many deep scrubs but the pg query does not show the scrub timestamps 
being affected.


I'm seeking ideas for either fixing metadata on the directory on osd.4 
to cause this pg to be seen/recognized, or ideas on forcing the 
cluster's pg map to point to osd.1 for the incomplete pg (basically 
wiping out the cluster's memory that osd.4 ever had 0.2f). Or any other 
solution :) It's only 59g, so worst case I'll mark it lost and recreate 
the pg, but I'd prefer to learn enough of the innards to understand what 
is going on, and possible means of fixing it.


Thanks for any help,

Jeff

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Possible repo packaging regression

2014-04-30 Thread Jeff Bachtel
Per http://tracker.ceph.com/issues/6022 leveldb-1.12 was pulled out of 
the ceph-extras repo due to patches applied by a leveldb fork (Basho 
patch). It's back in ceph-extras (since the 28th at least), and on 
CentOS 6 is causing an abort on mon start when run with the Firefly 
release candidate


# /etc/init.d/ceph start
=== mon.m1 ===
Starting Ceph mon.m1 on m1...
pthread lock: Invalid argument
*** Caught signal (Aborted) **
 in thread 7fa7524f67a0
 ceph version 0.80-rc1-28-ga027100 
(a0271000c12486d3c5adb2b0732e1c70c3789a4f)

 1: /usr/bin/ceph-mon() [0x86bcb1]
 2: /lib64/libpthread.so.0() [0x35cac0f710]
 3: (gsignal()+0x35) [0x35ca432925]
 4: (abort()+0x175) [0x35ca434105]
 5: (()+0x34d71) [0x7fa752532d71]
 6: (leveldb::DBImpl::Get(leveldb::ReadOptions const&, leveldb::Slice 
const&, leveldb::Value*)+0x50) [0x7fa7

52518120]
 7: (LevelDBStore::_get_iterator()+0x41) [0x826d71]
 8: (MonitorDBStore::exists(std::string const&, std::string 
const&)+0x28) [0x539e88]

 9: (main()+0x13f8) [0x533d78]
 10: (__libc_start_main()+0xfd) [0x35ca41ed1d]
 11: /usr/bin/ceph-mon() [0x530f19]
2014-04-30 10:32:40.397243 7fa7524f67a0 -1 *** Caught signal (Aborted) **
 in thread 7fa7524f67a0

The SRPM for what ended up on ceph-extras wasn't uploaded to the repo, 
so I didn't check to see if it was the Basho patch being applied again 
or something else. Downgrading back to leveldb 1.7.0-2 resolved my problem.


Is anyone else seeing this?

Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mon not binding to public interface

2014-01-15 Thread Jeff Bachtel
If I understand correctly then, I should either not specify mon addr or 
set it to an external IP?


Thanks for the clarification,

Jeff

On 01/15/2014 03:58 PM, John Wilkins wrote:

Jeff,

First, if you've specified the public and cluster networks in 
[global], you don't need to specify it anywhere else. If you do, they 
get overridden. That's not the issue here. It appears from your 
ceph.conf file that you've specified an address on the cluster 
network. Specifically, you specified mon addr = 10.100.10.1:6789 
<http://10.100.10.1:6789/>, but you indicated elsewhere that this IP 
address belongs to the cluster network.



On Mon, Jan 13, 2014 at 11:29 AM, Jeff Bachtel 
<mailto:jbach...@bericotechnologies.com>> wrote:


I've got a cluster with 3 mons, all of which are binding solely to
a cluster network IP, and neither to 0.0.0.0:6789
<http://0.0.0.0:6789> nor a public IP. I hadn't noticed the
problem until now because it makes little difference in how I
normally use Ceph (rbd and radosgw), but now that I'm trying to
use cephfs it's obviously suboptimal.

[global]
  auth cluster required = cephx
  auth service required = cephx
  auth client required = cephx
  keyring = /etc/ceph/keyring
  cluster network = 10.100.10.0/24 <http://10.100.10.0/24>
  public network = 10.100.0.0/21 <http://10.100.0.0/21>
  public addr = 10.100.0.150
  cluster addr = 10.100.10.1
   fsid = de10594a-0737-4f34-a926-58dc9254f95f

[mon]
  cluster network = 10.100.10.0/24 <http://10.100.10.0/24>
  public network = 10.100.0.0/21 <http://10.100.0.0/21>
  mon data = /var/lib/ceph/mon/mon.$id

[mon.controller1]
  host = controller1
  mon addr = 10.100.10.1:6789 <http://10.100.10.1:6789>
  public addr = 10.100.0.150
  cluster addr = 10.100.10.1
  cluster network = 10.100.10.0/24 <http://10.100.10.0/24>
  public network = 10.100.0.0/21 <http://10.100.0.0/21>

And then with /usr/bin/ceph-mon -i controller1 --debug_ms 12
--pid-file /var/run/ceph/mon.controller1.pid -c
/etc/ceph/ceph.conf I get in logs

2014-01-13 14:19:13.578458 7f195e6d97a0  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 7559
2014-01-13 14:19:13.641639 7f195e6d97a0 10 -- :/0 rank.bind
10.100.10.1:6789/0 <http://10.100.10.1:6789/0>
2014-01-13 14:19:13.641668 7f195e6d97a0 10 accepter.accepter.bind
2014-01-13 14:19:13.642773 7f195e6d97a0 10 accepter.accepter.bind
bound to 10.100.10.1:6789/0 <http://10.100.10.1:6789/0>
2014-01-13 14:19:13.642800 7f195e6d97a0  1 -- 10.100.10.1:6789/0
<http://10.100.10.1:6789/0> learned my addr 10.100.10.1:6789/0
<http://10.100.10.1:6789/0>
2014-01-13 14:19:13.642808 7f195e6d97a0  1 accepter.accepter.bind
my_inst.addr is 10.100.10.1:6789/0 <http://10.100.10.1:6789/0>
need_addr=0

Whith no mention of public addr (10.100.2.1) or public network
(10.100.0.0/21 <http://10.100.0.0/21>) found. mds (on this host)
and osd (on other hosts) bind to 0.0.0.0 and a public IP,
respectively.

At this point public/cluster addr/network are WAY overspecified in
ceph.conf, but the problem appeared with far less specification.

Any ideas? Thanks,

Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
John Wilkins
Senior Technical Writer
Intank
john.wilk...@inktank.com <mailto:john.wilk...@inktank.com>
(415) 425-9599
http://inktank.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mon not binding to public interface

2014-01-13 Thread Jeff Bachtel
I've got a cluster with 3 mons, all of which are binding solely to a 
cluster network IP, and neither to 0.0.0.0:6789 nor a public IP. I 
hadn't noticed the problem until now because it makes little difference 
in how I normally use Ceph (rbd and radosgw), but now that I'm trying to 
use cephfs it's obviously suboptimal.


[global]
  auth cluster required = cephx
  auth service required = cephx
  auth client required = cephx
  keyring = /etc/ceph/keyring
  cluster network = 10.100.10.0/24
  public network = 10.100.0.0/21
  public addr = 10.100.0.150
  cluster addr = 10.100.10.1
 
  fsid = de10594a-0737-4f34-a926-58dc9254f95f


[mon]
  cluster network = 10.100.10.0/24
  public network = 10.100.0.0/21
  mon data = /var/lib/ceph/mon/mon.$id

[mon.controller1]
  host = controller1
  mon addr = 10.100.10.1:6789
  public addr = 10.100.0.150
  cluster addr = 10.100.10.1
  cluster network = 10.100.10.0/24
  public network = 10.100.0.0/21

And then with /usr/bin/ceph-mon -i controller1 --debug_ms 12 --pid-file 
/var/run/ceph/mon.controller1.pid -c /etc/ceph/ceph.conf I get in logs


2014-01-13 14:19:13.578458 7f195e6d97a0  0 ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 7559
2014-01-13 14:19:13.641639 7f195e6d97a0 10 -- :/0 rank.bind 10.100.10.1:6789/0
2014-01-13 14:19:13.641668 7f195e6d97a0 10 accepter.accepter.bind
2014-01-13 14:19:13.642773 7f195e6d97a0 10 accepter.accepter.bind bound to 
10.100.10.1:6789/0
2014-01-13 14:19:13.642800 7f195e6d97a0  1 -- 10.100.10.1:6789/0 learned my 
addr 10.100.10.1:6789/0
2014-01-13 14:19:13.642808 7f195e6d97a0  1 accepter.accepter.bind my_inst.addr 
is 10.100.10.1:6789/0 need_addr=0

Whith no mention of public addr (10.100.2.1) or public network 
(10.100.0.0/21) found. mds (on this host) and osd (on other hosts) bind 
to 0.0.0.0 and a public IP, respectively.


At this point public/cluster addr/network are WAY overspecified in 
ceph.conf, but the problem appeared with far less specification.


Any ideas? Thanks,

Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Current state of OpenStack/Ceph rbd live migration?

2014-01-06 Thread Jeff Bachtel
I just wanted to get a quick sanity check (and ammunition for updating 
from Grizzly to Havana).


Per 
https://blueprints.launchpad.net/nova/+spec/bring-rbd-support-libvirt-images-type 
it seems that explicit support for rbd image types has been brought into 
OpenStack/Havana. Does this correspond to live-migration working 
properly yet in Nova?


For background, the nova libvirt driver in Grizzly did not grok how to 
live migrate for rbd (specifically the need to copy instance folders 
around, see 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-March/000536.html 
). I'm just curious if this situation is rectified?


Thanks,
Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ulimit max user processes (-u) and non-root ceph clients

2013-12-16 Thread Jeff Bailey

On 12/16/2013 2:36 PM, Dan Van Der Ster wrote:

On Dec 16, 2013 8:26 PM, Gregory Farnum  wrote:

On Mon, Dec 16, 2013 at 11:08 AM, Dan van der Ster
 wrote:

Hi,

Sorry to revive this old thread, but I wanted to update you on the current
pains we're going through related to clients' nproc (and now nofile)
ulimits. When I started this thread we were using RBD for Glance images
only, but now we're trying to enable RBD-backed Cinder volumes and are not
really succeeding at the moment :(

As we had guessed from our earlier experience, librbd and therefore qemu-kvm
need increased nproc/nofile limits otherwise VMs will freeze. In fact we
just observed a lockup of a test VM due to the RBD device blocking
completely (this appears as blocked flush processes in the VM); we're
actually not sure which of the nproc/nofile limits caused the freeze, but it
was surely one of those.

And the main problem we face now is that it isn't trivial to increase the
limits of qemu-kvm on a running OpenStack hypervisor -- the values are set
by libvirtd and seem to require a restart of all guest VMs on a host to
reload a qemu config file. I'll update this thread when we find the solution
to that...

Is there some reason you can't just set it ridiculously high to start with?


As I mentioned, we haven't yet found a way to change the limits without 
affecting (stopping) the existing running (important) VMs. We thought that 
/etc/security/limits.conf would do the trick, but alas limits there have no 
effect on qemu.


I don't know whether qemu (perhaps librbd to be more precise?) is aware 
of the limits and avoids them or simply gets errors when it exceeds 
them.  If it's the latter then couldn't you just use prlimit to change 
them?  If that's not possible then maybe just change the limit settings, 
migrate the VM and then migrate it back?



Cheers, Dan


Moving forward, IMHO it would be much better if Ceph clients could
gracefully work with large clusters without _requiring_ changes to the
ulimits. I understand that such poorly configured clients would necessarily
have decreased performance (since librados would need to use a thread pool
and also lose some of the persistent client-OSD connections). But client
lockups are IMHO worse that slightly lower performance.

Have you guys discussed the client ulimit issues recently and is there a
plan in the works?

I'm afraid not. It's a plannable but non-trivial amount of work and
the Inktank dev team is pretty well booked for a while. Anybody
running into this as a serious bottleneck should
1) try and start a community effort
2) try and promote it as a priority with any Inktank business contacts
they have.
(You are only the second group to report it as an ongoing concern
rather than a one-off hiccup, and honestly it sounds like you're just
having issues with hitting the arbitrary limits, not with real
resource exhaustion issues.)
:)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Continually crashing osds

2013-10-21 Thread Jeff Williams
I apologize,  I should have mentioned that both osd.3 and osd.11 crash 
immediately and if I do not 'set noout', the crash cascades to the rest of the 
cluster.

Thanks,
Jeff


Sent from my Samsung Galaxy Note™, an AT&T LTE smartphone



 Original message 
From: Samuel Just 
Date: 10/21/2013 4:47 PM (GMT-08:00)
To: Jeff Williams 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Continually crashing osds


What happened when you simply left the cluster to recover without osd.11 in?
-Sam

On Mon, Oct 21, 2013 at 4:01 PM, Jeff Williams  wrote:
> What is the best way to do that? I tried ceph pg repair, but it only did
> so much.
>
> On 10/21/13 3:54 PM, "Samuel Just"  wrote:
>
>>Can you get the pg to recover without osd.3?
>>-Sam
>>
>>On Mon, Oct 21, 2013 at 1:59 PM, Jeff Williams 
>>wrote:
>>> We're running xfs on a 3.8.0-31-generic kernel
>>>
>>> Thanks,
>>> Jeff
>>>
>>> On 10/21/13 1:54 PM, "Samuel Just"  wrote:
>>>
>>>>It looks like an xattr vanished from one of your objects on osd.3.
>>>>What fs are you running?
>>>>
>>>>On Mon, Oct 21, 2013 at 9:58 AM, Jeff Williams 
>>>>wrote:
>>>>> Hello all,
>>>>>
>>>>> Similar to this post from last month, I am experiencing 2 nodes that
>>>>>are
>>>>> constantly crashing upon start up:
>>>>> http://www.spinics.net/lists/ceph-users/msg04589.html
>>>>>
>>>>> Here are the logs from the 2 without the debug commands, here:
>>>>> http://pastebin.com/cB9ML5md and http://pastebin.com/csHHjC2h
>>>>>
>>>>> I have run the osds with the debug statements per the email, but I'm
>>>>>unsure
>>>>> where to post them, they are 108M each without compression. Should I
>>>>>create
>>>>> a bug on the tracker?
>>>>>
>>>>> Thanks,
>>>>> Jeff
>>>>>
>>>>> ___
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Continually crashing osds

2013-10-21 Thread Jeff Williams
What is the best way to do that? I tried ceph pg repair, but it only did
so much. 

On 10/21/13 3:54 PM, "Samuel Just"  wrote:

>Can you get the pg to recover without osd.3?
>-Sam
>
>On Mon, Oct 21, 2013 at 1:59 PM, Jeff Williams 
>wrote:
>> We're running xfs on a 3.8.0-31-generic kernel
>>
>> Thanks,
>> Jeff
>>
>> On 10/21/13 1:54 PM, "Samuel Just"  wrote:
>>
>>>It looks like an xattr vanished from one of your objects on osd.3.
>>>What fs are you running?
>>>
>>>On Mon, Oct 21, 2013 at 9:58 AM, Jeff Williams 
>>>wrote:
>>>> Hello all,
>>>>
>>>> Similar to this post from last month, I am experiencing 2 nodes that
>>>>are
>>>> constantly crashing upon start up:
>>>> http://www.spinics.net/lists/ceph-users/msg04589.html
>>>>
>>>> Here are the logs from the 2 without the debug commands, here:
>>>> http://pastebin.com/cB9ML5md and http://pastebin.com/csHHjC2h
>>>>
>>>> I have run the osds with the debug statements per the email, but I'm
>>>>unsure
>>>> where to post them, they are 108M each without compression. Should I
>>>>create
>>>> a bug on the tracker?
>>>>
>>>> Thanks,
>>>> Jeff
>>>>
>>>> ___
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Continually crashing osds

2013-10-21 Thread Jeff Williams
We're running xfs on a 3.8.0-31-generic kernel

Thanks,
Jeff

On 10/21/13 1:54 PM, "Samuel Just"  wrote:

>It looks like an xattr vanished from one of your objects on osd.3.
>What fs are you running?
>
>On Mon, Oct 21, 2013 at 9:58 AM, Jeff Williams 
>wrote:
>> Hello all,
>>
>> Similar to this post from last month, I am experiencing 2 nodes that are
>> constantly crashing upon start up:
>> http://www.spinics.net/lists/ceph-users/msg04589.html
>>
>> Here are the logs from the 2 without the debug commands, here:
>> http://pastebin.com/cB9ML5md and http://pastebin.com/csHHjC2h
>>
>> I have run the osds with the debug statements per the email, but I'm
>>unsure
>> where to post them, they are 108M each without compression. Should I
>>create
>> a bug on the tracker?
>>
>> Thanks,
>> Jeff
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Continually crashing osds

2013-10-21 Thread Jeff Williams
Hello all,

Similar to this post from last month, I am experiencing 2 nodes that are 
constantly crashing upon start up: 
http://www.spinics.net/lists/ceph-users/msg04589.html

Here are the logs from the 2 without the debug commands, here: 
http://pastebin.com/cB9ML5md and http://pastebin.com/csHHjC2h

I have run the osds with the debug statements per the email, but I'm unsure 
where to post them, they are 108M each without compression. Should I create a 
bug on the tracker?

Thanks,
Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ocfs2 for OSDs?

2013-09-12 Thread Jeff Bachtel
Previous experience with OCFS2 was that its actual performance was pretty
lackluster/awful. The bits Oracle threw on top of (I think) ext3 to make it
work as a multi-writer filesystem with all of the signalling that implies
brought the overall performance down.

Jeff


On Wed, Sep 11, 2013 at 9:58 AM, Ugis  wrote:

> Hi,
>
> I wonder is ocfs2 suitable for hosting OSD data?
> In ceph documentation only XFS, ext4 and btrfs are discussed, but
> looking at ocfs2 feature list it theoretically also could host OSDs:
>
> Some of the notable features of the file system are:
> Optimized Allocations (extents, reservations, sparse, unwritten
> extents, punch holes)
> REFLINKs (inode-based writeable snapshots)
> Indexed Directories
> Metadata Checksums
> Extended Attributes (unlimited number of attributes per inode)
> Advanced Security (POSIX ACLs and SELinux)
> User and Group Quotas
> Variable Block and Cluster sizes
> Journaling (Ordered and Writeback data journaling modes)
> Endian and Architecture Neutral (x86, x86_64, ia64 and ppc64)
> Buffered, Direct, Asynchronous, Splice and Memory Mapped I/Os
> In-built Clusterstack with a Distributed Lock Manager
> Cluster-aware Tools (mkfs, fsck, tunefs, etc.)
>
> ocfs2 can work in cluster mode but it can also work for single node.
>
> Just wondering would OSD work on ocfs2 and what would performance
> characteristics be.
> Any thoughts/experience?
>
> BR,
> Ugis Racko
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] do not upgrade bobtail -> dumpling directly until 0.67.2

2013-08-21 Thread Jeff Bachtel
Is there an issue ID associated with this? For those of us who made the
long jump and want to avoid any unseen problems.

Thanks,

Jeff


On Tue, Aug 20, 2013 at 7:57 PM, Sage Weil  wrote:

> We've identified a problem when upgrading directly from bobtail to
> dumpling; please wait until 0.67.2 before doing so.
>
> Upgrades from bobtail -> cuttlefish -> dumpling are fine.  It is only the
> long jump between versions that is problematic.
>
> The fix is already in the dumpling branch.  Another point release will be
> out in the next day or two.
>
> Thanks!
> sage
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance questions

2013-08-20 Thread Jeff Moskow

Martin,

Thanks for the confirmation about 3-replica performance.

dmesg | fgrep /dev/sdb # returns no matches

Jeff

--

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance questions

2013-08-20 Thread Jeff Moskow

Hi,

More information.  If I look in /var/log/ceph/ceph.log,  I see 7893 slow 
requests in the last 3 hours of which 7890 are from osd.4. Should I 
assume a bad drive?  I SMART says the drive is healthy? Bad osd?


Thanks,
 Jeff

--

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance questions

2013-08-20 Thread Jeff Moskow
Hi,

I am now occasionally seeing a ceph statuses like this:

   health HEALTH_WARN 2 requests are blocked > 32 sec

They aren't always present even though the cluster is still slow, but
they may be a clue....

Jeff

On Sat, Aug 17, 2013 at 02:32:47PM -0700, Sage Weil wrote:
> On Sat, 17 Aug 2013, Jeff Moskow wrote:
> > Hi,
> > 
> > When we rebuilt our ceph cluster, we opted to make our rbd storage 
> > replication level 3 rather than the previously configured replication 
> > level 2.
> > 
> > Things are MUCH slower (5 nodes, 13 osd's) than before even though 
> > most of our I/O is read.  Is this to be expected? What are the 
> > recommended ways of seeing who/what is consuming the largest amount of 
> > disk/network bandwidth?
> 
> It really doesn't sound like the replica count is the source of the 
> performance difference.  What else has changed?
> 
> sage

-- 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] performance questions

2013-08-17 Thread Jeff Moskow
Hi,

When we rebuilt our ceph cluster, we opted to make our rbd storage 
replication level 3 rather than the previously
configured replication level 2.

Things are MUCH slower (5 nodes, 13 osd's) than before even though most 
of our I/O is read.   Is this to be expected?
What are the recommended ways of seeing who/what is consuming the largest 
amount of disk/network bandwidth?

Thanks!
    Jeff

-- 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "rbd ls -l" hangs

2013-08-15 Thread Jeff Moskow
Greg,

Thanks for following up - I hope you had a GREAT vacation.

I eventually deleted and re-added the rbd pool which fixed the hanging 
problem but
left we with 114 stuck pages.

Sam suggested that I permanently remove the down osd's and after a few 
hours of
rebalancing everything is working fine :-)

(ceph auth del osd.x ; ceph osd crush rm osd.x ; ceph osd rm osd.x).

Jeff

On Wed, Aug 14, 2013 at 01:54:16PM -0700, Gregory Farnum wrote:
> On Thu, Aug 1, 2013 at 9:57 AM, Jeff Moskow  wrote:
> > Greg,
> >
> > Thanks for the hints.  I looked through the logs and found OSD's with
> > RETRY's.  I marked those "out" (marked in orange) and let ceph rebalance.
> > Then I ran the bench command.
> > I now have many more errors than before :-(.
> >
> > health HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 151 pgs stuck
> > unclean
> >
> > Note that the incomplete pg is still the same (2.1f6).
> >
> > Any ideas on what to try next?
> >
> > 2013-08-01 12:39:38.349011 osd.4 172.16.170.2:6801/1778 1154 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 18.085318 sec at 57979 KB/sec
> > 2013-08-01 12:39:38.499002 osd.5 172.16.170.2:6802/19375 454 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 18.232358 sec at 57511 KB/sec
> > 2013-08-01 12:39:44.077347 osd.3 172.16.170.2:6800/1647 1211 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 23.813801 sec at 44032 KB/sec
> > 2013-08-01 12:39:49.118812 osd.16 172.16.170.4:6802/1837 746 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 28.453320 sec at 36852 KB/sec
> > 2013-08-01 12:39:48.468020 osd.15 172.16.170.4:6801/1699 821 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 27.802566 sec at 37715 KB/sec
> > 2013-08-01 12:39:54.369364 osd.0 172.16.170.1:6800/3783 948 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 34.076451 sec at 30771 KB/sec
> > 2013-08-01 12:39:48.618080 osd.14 172.16.170.4:6800/1572 16161 : [INF]
> > bench: wrote 1024 MB in blocks of 4096 KB in 27.952574 sec at 37512 KB/sec
> > 2013-08-01 12:39:54.382830 osd.2 172.16.170.1:6803/22033 222 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 34.090170 sec at 30758 KB/sec
> > 2013-08-01 12:40:03.458096 osd.6 172.16.170.3:6801/1738 1582 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 43.143180 sec at 24304 KB/sec
> > 2013-08-01 12:40:03.724504 osd.10 172.16.170.3:6800/1473 1238 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 43.409558 sec at 24155 KB/sec
> > 2013-08-01 12:40:02.426650 osd.8 172.16.170.3:6803/2013 8272 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 42.111713 sec at 24899 KB/sec
> > 2013-08-01 12:40:02.997093 osd.7 172.16.170.3:6802/1864 1094 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 42.682079 sec at 24567 KB/sec
> > 2013-08-01 12:40:02.867046 osd.9 172.16.170.3:6804/2149 2258 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 42.551771 sec at 24642 KB/sec
> > 2013-08-01 12:39:54.360014 osd.1 172.16.170.1:6801/4243 3060 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 34.070725 sec at 30776 KB/sec
> > 2013-08-01 12:42:56.984632 osd.11 172.16.170.5:6800/28025 43996 : [INF]
> > bench: wrote 1024 MB in blocks of 4096 KB in 216.687559 sec at 4839 KB/sec
> > 2013-08-01 12:43:21.271481 osd.13 172.16.170.5:6802/1872 1056 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 240.974360 sec at 4351 KB/sec
> > 2013-08-01 12:43:39.320462 osd.12 172.16.170.5:6801/1700 1348 : [INF] bench:
> > wrote 1024 MB in blocks of 4096 KB in 259.023646 sec at 4048 KB/sec
> 
> Sorry for the slow reply; I've been out on vacation. :)
> Looking through this list, I'm noticing that many of your OSDs are
> reporting 4MB/s write speeds and they don't correspond to the ones you
> marked out (though if your cluster was somehow under load that could
> have something to do with the very different speed reports).
> 
> You still want to look at the pg statistics for the stuck PG; I'm not
> seeing that anywhere?
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com

-- 
===
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Wheezy machine died with problems on osdmap

2013-08-15 Thread Jeff Williams
Giuseppe,

You could install the kernel from wheezy backports - it is currently at 3.9.

http://backports.debian.org/Instructions/
http://packages.debian.org/source/stable-backports/linux

Regards,
Jeff


On 14 August 2013 10:08, Giuseppe 'Gippa' Paterno' wrote:

> Hi Sage,
> > What kernel version of this? It looks like an old kernel bug.
> > Generally speaking you should be using 3.4 at the very least if you
> > are using the kernel client. sage
> This is the standard Wheezy kernel, i.e. 3.2.0-4-amd64
> While I can recompile the kernel, I don't think would be manageable
> having a custom kernel in production.
> Is there a way I can open a bug in debian asking for a backport of the
> patch?
> Thanks.
> Regards,
> Giuseppe
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-13 Thread Jeff Moskow

Sam,

Thanks that did it :-)

   health HEALTH_OK
   monmap e17: 5 mons at 
{a=172.16.170.1:6789/0,b=172.16.170.2:6789/0,c=172.16.170.3:6789/0,d=172.16.170.4:6789/0,e=172.16.170.5:6789/0}, 
election epoch 9794, quorum 0,1,2,3,4 a,b,c,d,e

   osdmap e23445: 14 osds: 13 up, 13 in
pgmap v13552855: 2102 pgs: 2102 active+clean; 531 GB data, 1564 GB 
used, 9350 GB / 10914 GB avail; 13104KB/s rd, 4007KB/s wr, 560op/s

   mdsmap e3: 0/0/1 up

--

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-12 Thread Jeff Moskow
Sam,

3, 14 and 16 have been down for a while and I'll eventually replace 
those drives (I could do it now)
but didn't want to introduce more variables.

We are using RBD with Proxmox, so I think the answer about kernel 
clients is yes

Jeff

On Mon, Aug 12, 2013 at 02:41:11PM -0700, Samuel Just wrote:
> Are you using any kernel clients?  Will osds 3,14,16 be coming back?
> -Sam
> 
> On Mon, Aug 12, 2013 at 2:26 PM, Jeff Moskow  wrote:
> > Sam,
> >
> > I've attached both files.
> >
> > Thanks!
> > Jeff
> >
> > On Mon, Aug 12, 2013 at 01:46:57PM -0700, Samuel Just wrote:
> >> Can you attach the output of ceph osd tree?
> >>
> >> Also, can you run
> >>
> >> ceph osd getmap -o /tmp/osdmap
> >>
> >> and attach /tmp/osdmap?
> >> -Sam
> >>
> >> On Fri, Aug 9, 2013 at 4:28 AM, Jeff Moskow  wrote:
> >> > Thanks for the suggestion.  I had tried stopping each OSD for 30 seconds,
> >> > then restarting it, waiting 2 minutes and then doing the next one (all 
> >> > OSD's
> >> > eventually restarted).  I tried this twice.
> >> >
> >> > --
> >> >
> >> > ___
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > --
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-12 Thread Jeff Moskow
Sam,

I've attached both files.

Thanks!
    Jeff

On Mon, Aug 12, 2013 at 01:46:57PM -0700, Samuel Just wrote:
> Can you attach the output of ceph osd tree?
> 
> Also, can you run
> 
> ceph osd getmap -o /tmp/osdmap
> 
> and attach /tmp/osdmap?
> -Sam
> 
> On Fri, Aug 9, 2013 at 4:28 AM, Jeff Moskow  wrote:
> > Thanks for the suggestion.  I had tried stopping each OSD for 30 seconds,
> > then restarting it, waiting 2 minutes and then doing the next one (all OSD's
> > eventually restarted).  I tried this twice.
> >
> > --
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 

# idweight  type name   up/down reweight
-1  14.61   root default
-3  14.61   rack unknownrack
-2  2.783   host ceph1
0   0.919   osd.0   up  1   
1   0.932   osd.1   up  1   
2   0.932   osd.2   up  0   
-5  2.783   host ceph2
3   0.919   osd.3   down0   
4   0.932   osd.4   up  1   
5   0.932   osd.5   up  1   
-4  3.481   host ceph3
10  0.699   osd.10  up  1   
6   0.685   osd.6   up  1   
7   0.699   osd.7   up  1   
8   0.699   osd.8   up  1   
9   0.699   osd.9   up  1   
-6  2.783   host ceph4
14  0.919   osd.14  down0   
15  0.932   osd.15  up  1   
16  0.932   osd.16  down0   
-7  2.782   host ceph5
11  0.92osd.11  up  0   
12  0.931   osd.12  up  1   
13  0.931   osd.13  up  1   



osdmap
Description: Binary data
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph rbd io tracking (rbdtop?)

2013-08-12 Thread Jeff Moskow
Hi,

The activity on our ceph cluster has gone up a lot.  We are using exclusively 
RBD
storage right now.

Is there a tool/technique that could be used to find out which rbd images are
receiving the most activity (something like "rbdtop")?

Thanks,
    Jeff

-- 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-09 Thread Jeff Moskow
Thanks for the suggestion.  I had tried stopping each OSD for 30 
seconds, then restarting it, waiting 2 minutes and then doing the next 
one (all OSD's eventually restarted).  I tried this twice.


--

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-09 Thread Jeff Moskow
Hi,

I have a 5 node ceph cluster that is running well (no problems using 
any of the
rbd images and that's really all we use).  

I have replication set to 3 on all three pools (data, metadata and rbd).

"ceph -s" reports:
health HEALTH_WARN 3 pgs degraded; 114 pgs stuck unclean; 
recovery 5746/384795 degraded (1.493%)

I have tried everything I could think of to clear/fix those errors and 
they persist.

Most of them appear to be a problem with not having 3 copies

0.2a0   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:40:07.874427  0'0 21920'388   [4,7]   [4,7,8] 0'0 
2013-08-04 08:59:34.035198  0'0 2013-07-29 01:49:40.018625
4.1d9   260 0   238 0   1021055488  0   0   
active+remapped 2013-08-06 05:56:20.447612  21920'12710 21920'53408 
[6,13]  [6,13,4]0'0 2013-08-05 06:59:44.717555  0'0 2013-08-05 
06:59:44.717555
1.1dc   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:55:44.687830  0'0 21920'3003  [6,13]  [6,13,4]
0'0 2013-08-04 10:56:51.226012  0'0 2013-07-28 23:47:13.404512
0.1dd   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:55:44.687525  0'0 21920'3003  [6,13]  [6,13,4]
0'0 2013-08-04 10:56:45.258459  0'0 2013-08-01 05:58:17.141625
1.29f   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:40:07.882865  0'0 21920'388   [4,7]   [4,7,8] 0'0 
2013-08-04 09:01:40.075441  0'0 2013-07-29 01:53:10.068503
1.118   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:50:34.081067  0'0 21920'208   [8,15]  [8,15,5]
0'0 2034-02-12 23:20:03.933842  0'0 2034-02-12 23:20:03.933842
0.119   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:50:34.095446  0'0 21920'208   [8,15]  [8,15,5]
0'0 2034-02-12 23:18:07.310080  0'0 2034-02-12 23:18:07.310080
4.115   248 0   226 0   987364352   0   0   
active+remapped 2013-08-06 05:50:34.112139  21920'6840  21920'42982 
[8,15]  [8,15,5]0'0 2013-08-05 06:59:18.303823  0'0 2013-08-05 
06:59:18.303823
4.4a241 0   286 0   941573120   0   0   
active+degraded 2013-08-06 12:00:47.758742  21920'85238 21920'206648
[4,6]   [4,6]   0'0 2013-08-05 06:58:36.681726  0'0 2013-08-05 
06:58:36.681726
0.4e0   0   0   0   0   0   0   active+remapped 
2013-08-06 12:00:47.765391  0'0 21920'489   [4,6]   [4,6,1] 0'0 
2013-08-04 08:58:12.783265  0'0 2013-07-28 14:21:38.227970


Can anyone suggest a way to clear this up?

Thanks!
Jeff


-- 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] re-initializing a ceph cluster

2013-08-05 Thread Jeff Moskow
After more than a week of trying to restore our cluster I've given up.

I'd like to reset the data, metadata and rbd pools to their initial clean
states (wiping out all data).  Is there an easy way to do this?  I tried
deleting and adding pools, but still have:

   health HEALTH_WARN 32 pgs degraded; 86 pgs stuck unclean

Thanks!
    Jeff

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "rbd ls -l" hangs

2013-08-01 Thread Jeff Moskow

Greg,

Thanks for the hints.  I looked through the logs and found OSD's 
with RETRY's.  I marked those "out" (marked in orange) and let ceph 
rebalance.  Then I ran the bench command.

I now have many more errors than before :-(.

health HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 151 pgs stuck 
unclean


Note that the incomplete pg is still the same (2.1f6).

Any ideas on what to try next?

2013-08-01 12:39:38.349011 osd.4 172.16.170.2:6801/1778 1154 : [INF] 
bench: wrote 1024 MB in blocks of 4096 KB in 18.085318 sec at 57979 KB/sec
2013-08-01 12:39:38.499002 osd.5 172.16.170.2:6802/19375 454 : [INF] 
bench: wrote 1024 MB in blocks of 4096 KB in 18.232358 sec at 57511 KB/sec
2013-08-01 12:39:44.077347 osd.3 172.16.170.2:6800/1647 1211 : [INF] 
bench: wrote 1024 MB in blocks of 4096 KB in 23.813801 sec at 44032 KB/sec
2013-08-01 12:39:49.118812 osd.16 172.16.170.4:6802/1837 746 : [INF] 
bench: wrote 1024 MB in blocks of 4096 KB in 28.453320 sec at 36852 KB/sec
2013-08-01 12:39:48.468020 osd.15 172.16.170.4:6801/1699 821 : [INF] 
bench: wrote 1024 MB in blocks of 4096 KB in 27.802566 sec at 37715 KB/sec
2013-08-01 12:39:54.369364 osd.0 172.16.170.1:6800/3783 948 : [INF] 
bench: wrote 1024 MB in blocks of 4096 KB in 34.076451 sec at 30771 KB/sec
2013-08-01 12:39:48.618080 osd.14 172.16.170.4:6800/1572 16161 : [INF] 
bench: wrote 1024 MB in blocks of 4096 KB in 27.952574 sec at 37512 KB/sec
2013-08-01 12:39:54.382830 osd.2 172.16.170.1:6803/22033 222 : [INF] 
bench: wrote 1024 MB in blocks of 4096 KB in 34.090170 sec at 30758 KB/sec
2013-08-01 12:40:03.458096 osd.6 172.16.170.3:6801/1738 1582 : [INF] 
bench: wrote 1024 MB in blocks of 4096 KB in 43.143180 sec at 24304 KB/sec
2013-08-01 12:40:03.724504 osd.10 172.16.170.3:6800/1473 1238 : [INF] 
bench: wrote 1024 MB in blocks of 4096 KB in 43.409558 sec at 24155 KB/sec
2013-08-01 12:40:02.426650 osd.8 172.16.170.3:6803/2013 8272 : [INF] 
bench: wrote 1024 MB in blocks of 4096 KB in 42.111713 sec at 24899 KB/sec
2013-08-01 12:40:02.997093 osd.7 172.16.170.3:6802/1864 1094 : [INF] 
bench: wrote 1024 MB in blocks of 4096 KB in 42.682079 sec at 24567 KB/sec
2013-08-01 12:40:02.867046 osd.9 172.16.170.3:6804/2149 2258 : [INF] 
bench: wrote 1024 MB in blocks of 4096 KB in 42.551771 sec at 24642 KB/sec
2013-08-01 12:39:54.360014 osd.1 172.16.170.1:6801/4243 3060 : [INF] 
bench: wrote 1024 MB in blocks of 4096 KB in 34.070725 sec at 30776 KB/sec
2013-08-01 12:42:56.984632 osd.11 172.16.170.5:6800/28025 43996 : [INF] 
bench: wrote 1024 MB in blocks of 4096 KB in 216.687559 sec at 4839 KB/sec
2013-08-01 12:43:21.271481 osd.13 172.16.170.5:6802/1872 1056 : [INF] 
bench: wrote 1024 MB in blocks of 4096 KB in 240.974360 sec at 4351 KB/sec
2013-08-01 12:43:39.320462 osd.12 172.16.170.5:6801/1700 1348 : [INF] 
bench: wrote 1024 MB in blocks of 4096 KB in 259.023646 sec at 4048 KB/sec


Jeff

--

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "rbd ls -l" hangs

2013-07-30 Thread Jeff Moskow
OK - so while things are definitely better, we still are not where we 
were and "rbd ls -l" still hangs.


Any suggestions?

--

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "rbd ls -l" hangs

2013-07-30 Thread Jeff Moskow
Thanks!  I tried restarting osd.11 (the primary osd for the incomplete pg) and
that helped a LOT.   We went from 0/1 op/s to 10-800+ op/s!

We still have "HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs stuck 
unclean", but at least we can
use our cluster :-)

ceph pg dump_stuck inactive
ok
pg_stat objects mip degrunf bytes   log disklog state   
state_stamp v   reportedup acting  last_scrub  scrub_stamp  
   last_deep_scrub deep_scrub_stamp
2.1f6   118 0   0   0   403118080   0   0   
incomplete  2013-07-30 06:08:18.883179 11127'11658123  12914'1506  
[11,9]  [11,9]  10321'11641837  2013-07-28 00:59:09.552640  10321'11641837

Thanks again!
Jeff


On Tue, Jul 30, 2013 at 11:44:58AM +0200, Jens Kristian S?gaard wrote:
> Hi,
>
>> This is the same issue as yesterday, but I'm still searching for a  
>> solution.  We have a lot of data on the cluster that we need and can't  
>>health HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs 
>
> I'm not claiming to have an answer, but I have a suggestion you can try.
>
> Try running "ceph pg dump" to list all the pgs. Grep for ones that are  
> inactive / incomplete. Note which osds they are on - it is listed in the  
> square brackets with the primary being the first in the list.
>
> Now try restarting the primary osd for the stuck pg and see if that  
> could possible shift things into place.
>
> -- 
> Jens Kristian S?gaard, Mermaid Consulting ApS,
> j...@mermaidconsulting.dk,
> http://www.mermaidconsulting.com/

-- 
===
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] "rbd ls -l" hangs

2013-07-30 Thread Jeff Moskow
This is the same issue as yesterday, but I'm still searching for a 
solution.  We have a lot of data on the cluster that we need and can't 
get to it reasonably (It took over 12 hours to export a 2GB image).


The only thing that status reports as wrong is:

   health HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs 
stuck unclean


FYI - this happened after we added a fifth node and two mons (total now 
5) to our cluster.


Thanks for any help!

--

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Did I permanently break it?

2013-07-29 Thread Jeff Moskow

I've had a 4 node ceph cluster working well for month.

This weekend I added a 5th node to the cluster and after many hours of 
rebalancing I have the following warning:


HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs stuck 
unclean


But, my big problem is that the cluster is almost useless, the 
throughput went from great to literally just a few blocks/second.


And even on the cluster itself, an "rbd ls" command is nearly 
instantaneous, while "rbd ls -l" hasn't completed after 15 minutes


Any ideas what to look at?  All of the ceph bits are up to date as of 
yesterday (ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff).


Thanks for any help/suggestions!!

Jeff

--

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issues with a fresh cluster and HEALTH_WARN

2013-06-06 Thread Jeff Bailey
You need to fix your clocks (usually with ntp).  According to the log
message they can be off by 50ms and yours seems to be about 85ms off.


On 6/6/2013 8:40 PM, Joshua Mesilane wrote:
> Hi,
>
> I'm currently evaulating ceph as a solution to some HA storage that
> we're looking at. To test I have 3 servers, with two disks to be used
> for OSDs on them (journals on the same disk as the OSD). I've deployed
> the cluster with 3 mons (one on each server) 6 OSDs (2 on each server)
> and 3 MDS (1 on each server)
>
> I've built the cluster using ceph-deploy checked out from git on my
> local workstation (Fedora 15) and the Severs themselves are running
> CentOS 6.4
>
> First note: It looks like the ceph-deploy tool, when you run
> "ceph-deploy osd perpare host:device" is actually also activating the
> OSD when it's done instead of waiting for you to run the ceph-deploy
> osd activate command.
>
> Question: Is ceph-deploy supposed to be writing out the [mon] and the
> [osd] sections to the ceph.conf configuration file? I can't find any
> reference to anything in the config file except for the [global]
> section, and there are no other sections.
>
> Question: Once I got all 6 of my OSDs online I'm getting the following
> health error:
>
> "health HEALTH_WARN 91 pgs degraded; 192 pgs stuck unclean; clock skew
> detected on mon.sv-dev-ha02, mon.sv-dev-ha03"
>
> ceph health details gives me (Truncated for readability):
>
> [root@sv-dev-ha02 ~]# ceph health detail
> HEALTH_WARN 91 pgs degraded; 192 pgs stale; 192 pgs stuck unclean; 2/6
> in osds are down; clock skew detected on mon.sv-dev-ha02, mon.sv-dev-ha03
> pg 2.3d is stuck unclean since forever, current state
> stale+active+remapped, last acting [1,0]
> pg 1.3e is stuck unclean since forever, current state
> stale+active+remapped, last acting [1,0]
>  (Lots more lines like this) ...
> pg 1.1 is stuck unclean since forever, current state
> stale+active+remapped, last acting [1,0]
> pg 0.0 is stuck unclean since forever, current state
> stale+active+degraded, last acting [0]
> pg 0.3f is stale+active+remapped, acting [1,0]
> pg 1.3e is stale+active+remapped, acting [1,0]
> ... (Lots more lines like this) ...
> pg 1.1 is stale+active+remapped, acting [1,0]
> pg 2.2 is stale+active+remapped, acting [1,0]
> osd.0 is down since epoch 25, last address 10.20.100.90:6800/3994
> osd.1 is down since epoch 25, last address 10.20.100.90:6803/4758
> mon.sv-dev-ha02 addr 10.20.100.91:6789/0 clock skew 0.0858782s > max
> 0.05s (latency 0.00546217s)
> mon.sv-dev-ha03 addr 10.20.100.92:6789/0 clock skew 0.0852838s > max
> 0.05s (latency 0.00533693s)
>
> Any help on how to start troubleshooting this issue would be appreciated.
>
> Cheers,
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CentOS + qemu-kvm rbd support update

2013-06-04 Thread Jeff Bachtel
Hijacking (because it's related): a couple weeks ago on IRC it was
indicated a repo with these (or updated) qemu builds for CentOS should be
coming soon from Ceph/Inktank. Did that ever happen?

Thanks,
Jeff


On Mon, Jun 3, 2013 at 10:25 PM, YIP Wai Peng  wrote:

> Hi Andrel,
>
> Have you tried the patched ones at
> https://objects.dreamhost.com/rpms/qemu/qemu-kvm-0.12.1.2-2.355.el6.2.x86_64.rpmand
> https://objects.dreamhost.com/rpms/qemu/qemu-img-0.12.1.2-2.355.el6.2.x86_64.rpm?
>
> I got the links off the IRC chat, I'm using them now.
>
> - WP
>
>
> On Sun, Jun 2, 2013 at 8:41 AM, Andrei Mikhailovsky wrote:
>
>> Hello guys,
>>
>> Was wondering if there are any news on the CentOS 6 qemu-kvm packages
>> with rbd support? I am very keen to try it out.
>>
>> Thanks
>>
>> Andrei
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to get RadosGW working on CentOS 6

2013-05-14 Thread Jeff Bachtel
I fixed it this morning (and as always, PEBCAK):

I needed to turn off FastCgiWrapper in my fasctcgi.conf,
per doc/radosgw/config.rst in branch master. In my (weak) defense, it
wasn't in the Bobtail version of the document.

After doing that and starting radosgw from the init script in the next
branch, things seem to be working (s3test.py is successful).

Thanks for the help,
Jeff



On Tue, May 14, 2013 at 6:35 AM, Jeff Bachtel <
jbach...@bericotechnologies.com> wrote:

> That configuration option is set, the results are the same. To clarify: do
> I need to start radosgw from the command line if it is being spawned by
> fastcgi? I've tried it both ways with the same result.
>
> Thanks,
> Jeff
>
>
> On Tue, May 14, 2013 at 12:56 AM, Yehuda Sadeh  wrote:
>
>> On Mon, May 13, 2013 at 7:01 PM, Jeff Bachtel
>>  wrote:
>> > Environment is CentOS 6.4, Apache, mod_fastcgi (from repoforge, so
>> probably
>> > without the continue 100 patches). I'm attempting to install radosgw on
>> the
>> > 2nd mon host.
>> >
>> > My setup consistently fails when running s3test.py from
>> > http://wiki.debian.org/OpenStackCephHowto (with appropriate values
>> filled
>> > in, of course. I used /var/www/html instead of /var/www). The
>> radosgw-admin
>> > user and subuser commands on that Howto execute and give expected
>> results.
>> >
>> > Apache error.log throws (repeated):
>> >
>> > [Mon May 13 21:49:02 2013] [warn] FastCGI: (dynamic) server
>> > "/var/www/html/s3gw.fcgi" (uid 0, gid 0) restarted (pid 20102)
>> > [Mon May 13 21:49:02 2013] [warn] FastCGI: (dynamic) server
>> > "/var/www/html/s3gw.fcgi" (pid 20102) terminated by calling exit with
>> status
>> > '0'
>> >
>> > virtualhost access log throws (repeated):
>> > 10.100.2.2 - - [13/May/2013:21:48:55 -0400] "PUT /my-new-bucket/
>> HTTP/1.1"
>> > 500 538 "-" "Boto/2.5.2 (linux2)"
>> > 10.100.2.2 - - [13/May/2013:21:49:30 -0400] "PUT /my-new-bucket/
>> HTTP/1.1"
>> > 500 538 "-" "Boto/2.5.2 (linux2)"
>> >
>> > virtualhost error log throws (repeated):
>> > [Mon May 13 21:49:51 2013] [error] [client 10.100.2.2] FastCGI: comm
>> with
>> > (dynamic) server "/var/www/html/s3gw.fcgi" aborted: (first read) idle
>> > timeout (20 sec)
>> > [Mon May 13 21:49:51 2013] [error] [client 10.100.2.2] FastCGI:
>> incomplete
>> > headers (0 bytes) received from server "/var/www/html/s3gw.fcgi"
>> >
>> > radosgw log is voluminous because I've got debug ms=1 and debug rgw=20
>> set,
>> > but the most common error message looking bit is about not being able to
>> > obtain a lock on gc (garbage collection, I presume) objects. Excerpt at
>> > http://pastebin.com/zyAXMLjF
>> >
>> > radosgw is being spawned, perhaps to excess, by Apache:
>> > apache   32566  0.5  0.0 5394300 13464 ?   Ssl  21:57   0:00
>> > /usr/bin/radosgw -c /etc/ceph/ceph.conf -n client.radosgw.gateway
>> > apache   32689  0.5  0.0 5394300 13472 ?   Ssl  21:57   0:00
>> > /usr/bin/radosgw -c /etc/ceph/ceph.conf -n client.radosgw.gateway
>> > [root@controller2 ceph]# ps auxww | grep ceph | wc -l
>> > 237
>> >
>> > VirtualHost servername matches fqdn. ceph.conf uses short hostname
>> (both are
>> > in /etc/hosts pointing to same IP).
>> >
>> > Any ideas what might be causing the FastCGI errors? I saw the similar
>> > problems originally with fcgid, which was what led me to install
>> > mod_fastcgi.
>> >
>>
>> Try setting 'rgw print continue = false' on your gateway config (and
>> restart the gateway).
>>
>> Yehuda
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to get RadosGW working on CentOS 6

2013-05-14 Thread Jeff Bachtel
That configuration option is set, the results are the same. To clarify: do
I need to start radosgw from the command line if it is being spawned by
fastcgi? I've tried it both ways with the same result.

Thanks,
Jeff


On Tue, May 14, 2013 at 12:56 AM, Yehuda Sadeh  wrote:

> On Mon, May 13, 2013 at 7:01 PM, Jeff Bachtel
>  wrote:
> > Environment is CentOS 6.4, Apache, mod_fastcgi (from repoforge, so
> probably
> > without the continue 100 patches). I'm attempting to install radosgw on
> the
> > 2nd mon host.
> >
> > My setup consistently fails when running s3test.py from
> > http://wiki.debian.org/OpenStackCephHowto (with appropriate values
> filled
> > in, of course. I used /var/www/html instead of /var/www). The
> radosgw-admin
> > user and subuser commands on that Howto execute and give expected
> results.
> >
> > Apache error.log throws (repeated):
> >
> > [Mon May 13 21:49:02 2013] [warn] FastCGI: (dynamic) server
> > "/var/www/html/s3gw.fcgi" (uid 0, gid 0) restarted (pid 20102)
> > [Mon May 13 21:49:02 2013] [warn] FastCGI: (dynamic) server
> > "/var/www/html/s3gw.fcgi" (pid 20102) terminated by calling exit with
> status
> > '0'
> >
> > virtualhost access log throws (repeated):
> > 10.100.2.2 - - [13/May/2013:21:48:55 -0400] "PUT /my-new-bucket/
> HTTP/1.1"
> > 500 538 "-" "Boto/2.5.2 (linux2)"
> > 10.100.2.2 - - [13/May/2013:21:49:30 -0400] "PUT /my-new-bucket/
> HTTP/1.1"
> > 500 538 "-" "Boto/2.5.2 (linux2)"
> >
> > virtualhost error log throws (repeated):
> > [Mon May 13 21:49:51 2013] [error] [client 10.100.2.2] FastCGI: comm with
> > (dynamic) server "/var/www/html/s3gw.fcgi" aborted: (first read) idle
> > timeout (20 sec)
> > [Mon May 13 21:49:51 2013] [error] [client 10.100.2.2] FastCGI:
> incomplete
> > headers (0 bytes) received from server "/var/www/html/s3gw.fcgi"
> >
> > radosgw log is voluminous because I've got debug ms=1 and debug rgw=20
> set,
> > but the most common error message looking bit is about not being able to
> > obtain a lock on gc (garbage collection, I presume) objects. Excerpt at
> > http://pastebin.com/zyAXMLjF
> >
> > radosgw is being spawned, perhaps to excess, by Apache:
> > apache   32566  0.5  0.0 5394300 13464 ?   Ssl  21:57   0:00
> > /usr/bin/radosgw -c /etc/ceph/ceph.conf -n client.radosgw.gateway
> > apache   32689  0.5  0.0 5394300 13472 ?   Ssl  21:57   0:00
> > /usr/bin/radosgw -c /etc/ceph/ceph.conf -n client.radosgw.gateway
> > [root@controller2 ceph]# ps auxww | grep ceph | wc -l
> > 237
> >
> > VirtualHost servername matches fqdn. ceph.conf uses short hostname (both
> are
> > in /etc/hosts pointing to same IP).
> >
> > Any ideas what might be causing the FastCGI errors? I saw the similar
> > problems originally with fcgid, which was what led me to install
> > mod_fastcgi.
> >
>
> Try setting 'rgw print continue = false' on your gateway config (and
> restart the gateway).
>
> Yehuda
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Unable to get RadosGW working on CentOS 6

2013-05-13 Thread Jeff Bachtel
Environment is CentOS 6.4, Apache, mod_fastcgi (from repoforge, so probably
without the continue 100 patches). I'm attempting to install radosgw on the
2nd mon host.

My setup consistently fails when running s3test.py from
http://wiki.debian.org/OpenStackCephHowto (with appropriate values filled
in, of course. I used /var/www/html instead of /var/www). The radosgw-admin
user and subuser commands on that Howto execute and give expected results.

Apache error.log throws (repeated):

[Mon May 13 21:49:02 2013] [warn] FastCGI: (dynamic) server
"/var/www/html/s3gw.fcgi" (uid 0, gid 0) restarted (pid 20102)
[Mon May 13 21:49:02 2013] [warn] FastCGI: (dynamic) server
"/var/www/html/s3gw.fcgi" (pid 20102) terminated by calling exit with
status '0'

virtualhost access log throws (repeated):
10.100.2.2 - - [13/May/2013:21:48:55 -0400] "PUT /my-new-bucket/ HTTP/1.1"
500 538 "-" "Boto/2.5.2 (linux2)"
10.100.2.2 - - [13/May/2013:21:49:30 -0400] "PUT /my-new-bucket/ HTTP/1.1"
500 538 "-" "Boto/2.5.2 (linux2)"

virtualhost error log throws (repeated):
[Mon May 13 21:49:51 2013] [error] [client 10.100.2.2] FastCGI: comm with
(dynamic) server "/var/www/html/s3gw.fcgi" aborted: (first read) idle
timeout (20 sec)
[Mon May 13 21:49:51 2013] [error] [client 10.100.2.2] FastCGI: incomplete
headers (0 bytes) received from server "/var/www/html/s3gw.fcgi"

radosgw log is voluminous because I've got debug ms=1 and debug rgw=20 set,
but the most common error message looking bit is about not being able to
obtain a lock on gc (garbage collection, I presume) objects. Excerpt at
http://pastebin.com/zyAXMLjF

radosgw is being spawned, perhaps to excess, by Apache:
apache   32566  0.5  0.0 5394300 13464 ?   Ssl  21:57   0:00
/usr/bin/radosgw -c /etc/ceph/ceph.conf -n client.radosgw.gateway
apache   32689  0.5  0.0 5394300 13472 ?   Ssl  21:57   0:00
/usr/bin/radosgw -c /etc/ceph/ceph.conf -n client.radosgw.gateway
[root@controller2 ceph]# ps auxww | grep ceph | wc -l
237

VirtualHost servername matches fqdn. ceph.conf uses short hostname (both
are in /etc/hosts pointing to same IP).

Any ideas what might be causing the FastCGI errors? I saw the similar
problems originally with fcgid, which was what led me to install
mod_fastcgi.

Thanks,

Jeff
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph mon quorum

2013-04-05 Thread Jeff Anderson-Lee

On 4/5/2013 10:32 AM, Gregory Farnum wrote:

On Fri, Apr 5, 2013 at 10:28 AM, Dimitri Maziuk  wrote:

On 04/05/2013 10:12 AM, Wido den Hollander wrote:


Think about it this way. You have two racks and the network connection
between them fails. If both racks keep operating because they can still
reach that single monitor in their rack you will end up with data
inconsistency.

Yes. In DRBD land it's called 'split brain' and they have (IIRC) entire
chapter in the user manual about picking up the pieces. It's not a new
problem.


You should place mon.c outside rack A or B to keep you up and running in
this situation.

It's not about racks, it's about rooms, but let's say rack == room ==
colocation facility. And I have two of those.

Are you saying I need a 3rd colo with all associated overhead to have a
usable replica of my data in colo #2?

Or just a VM running somewhere that's got a VPN connection to your
room-based monitors, yes. Ceph is a strongly consistent system and
you're not going to get split brains, period. This is the price you
pay for that.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
The point is I believe that you don't need a 3rd replica of everything, 
just a 3rd MON running somewhere else.


Jeff

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


  1   2   >