[ceph-users] CephFS quota
Hello, I'm trying to use CephFS quaotas. On my client I've created a subdirectory in my CephFS mountpoint and used the following command from the documentation. setfattr -n ceph.quota.max_bytes -v 1 /mnt/cephfs/quota But if I create files bigger than my quota nothing happens. Do I need a mount option to use Quotas? Regards - Willi ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] what happen to the OSDs if the OS disk dies?
> Op 13 aug. 2016 om 08:58 heeft Georgios Dimitrakakis > het volgende geschreven: > > >>> Op 13 aug. 2016 om 03:19 heeft Bill Sharer het volgende geschreven: >>> >>> If all the system disk does is handle the o/s (ie osd journals are >>> on dedicated or osd drives as well), no problem. Just rebuild the >>> system and copy the ceph.conf back in when you re-install ceph. >>> Keep a spare copy of your original fstab to keep your osd filesystem >>> mounts straight. >> >> With systems deployed with ceph-disk/ceph-deploy you no longer need a >> fstab. Udev handles it. >> >>> Just keep in mind that you are down 11 osds while that system drive >>> gets rebuilt though. It's safer to do 10 osds and then have a >>> mirror set for the system disk. >> >> In the years that I run Ceph I rarely see OS disks fail. Why bother? >> Ceph is designed for failure. >> >> I would not sacrifice a OSD slot for a OS disk. Also, let's say a >> additional OS disk is €100. >> >> If you put that disk in 20 machines that's €2.000. For that money >> you can even buy a additional chassis. >> >> No, I would run on a single OS disk. It fails? Let it fail. Re-install >> and you're good again. >> >> Ceph makes sure the data is safe. >> > > Wido, > > can you elaborate a little bit more on this? How does CEPH achieve that? Is > it by redundant MONs? > No, Ceph replicates over hosts by default. So you can loose a host and the other ones will have copies. > To my understanding the OSD mapping is needed to have the cluster back. In > our setup (I assume in others as well) that is stored in the OS > disk.Furthermore, our MONs are running on the same host as OSDs. So if the OS > disk fails not only we loose the OSD host but we also loose the MON node. Is > there another way to be protected by such a failure besides additional MONs? > Aha, MON on the OSD host. I never recommend that. Try to use dedicated machines with a good SSD for MONs. Technically you can run the MON on the OSD nodes, but I always try to avoid it. It just isn't practical when stuff really goes wrong. Wido > We recently had a problem where a user accidentally deleted a volume. Of > course this has nothing to do with OS disk failure itself but we 've been in > the loop to start looking for other possible failures on our system that > could jeopardize data and this thread got my attention. > > > Warmest regards, > > George > > >> Wido >> >> Bill Sharer >> >>> On 08/12/2016 03:33 PM, Ronny Aasen wrote: >>> On 12.08.2016 13:41, Félix Barbeira wrote: Hi, I'm planning to make a ceph cluster but I have a serious doubt. At this moment we have ~10 servers DELL R730xd with 12x4TB SATA disks. The official ceph docs says: "We recommend using a dedicated drive for the operating system and software, and one drive for each Ceph OSD Daemon you run on the host." I could use for example 1 disk for the OS and 11 for OSD data. In the operating system I would run 11 daemons to control the OSDs. But...what happen to the cluster if the disk with the OS fails?? maybe the cluster thinks that 11 OSD failed and try to replicate all that data over the cluster...that sounds no good. Should I use 2 disks for the OS making a RAID1? in this case I'm "wasting" 8TB only for ~10GB that the OS needs. In all the docs that i've been reading says ceph has no unique single point of failure, so I think that this scenario must have a optimal solution, maybe somebody could help me. Thanks in advance. -- Félix Barbeira. >>> if you do not have dedicated slots on the back for OS disks, then i >>> would recomend using SATADOM flash modules directly into a SATA port >>> internal in the machine. Saves you 2 slots for osd's and they are >>> quite reliable. you could even use 2 sd cards if your machine have >>> the internal SD slot >>> >>> >> http://www.dell.com/downloads/global/products/pedge/en/poweredge-idsdm-whitepaper-en.pdf >>> [1] >>> >>> kind regards >>> Ronny Aasen >>> >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com [2] >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3] >>> >>> ___ >>> ceph-users mailing list >>> ceph-u >> ph.com >> http://li >> >>> i/ceph-users-ceph.com >> >> >> Links: >> -- >> [1] >> http://www.dell.com/downloads/global/products/pedge/en/poweredge-idsdm-whitepaper-en.pdf >> [2] mailto:ceph-users@lists.ceph.com >> [3] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> [4] mailto:bsha...@sharerland.com > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://li
Re: [ceph-users] CephFS quota
> Op 13 aug. 2016 om 09:24 heeft Willi Fehler het > volgende geschreven: > > Hello, > > I'm trying to use CephFS quaotas. On my client I've created a subdirectory in > my CephFS mountpoint and used the following command from the documentation. > > setfattr -n ceph.quota.max_bytes -v 1 /mnt/cephfs/quota > > But if I create files bigger than my quota nothing happens. Do I need a mount > option to use Quotas? > What version is the client? CephFS quotas rely on the client to support it as well. Wido > Regards - Willi > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Cascading failure on a placement group
Hello all, My cluster started to lose OSDs without any warning, whenever an OSD becomes the primary for a particular PG it crashes with the following stacktrace: ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) 1: /usr/bin/ceph-osd() [0xada722] 2: (()+0xf100) [0x7fc28bca5100] 3: (gsignal()+0x37) [0x7fc28a6bd5f7] 4: (abort()+0x148) [0x7fc28a6bece8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fc28afc29d5] 6: (()+0x5e946) [0x7fc28afc0946] 7: (()+0x5e973) [0x7fc28afc0973] 8: (()+0x5eb93) [0x7fc28afc0b93] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x27a) [0xbddcba] 10: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)+0x75f) [0x87e48f] 11: (ReplicatedPG::hit_set_persist()+0xedb) [0x87f4ab] 12: (ReplicatedPG::do_op(std::tr1::shared_ptr&)+0xe3a) [0x8a0d1a] 13: (ReplicatedPG::do_request(std::tr1::shared_ptr&, ThreadPool::TPHandle&)+0x68a) [0x83be4a] 14: (OSD::dequeue_op(boost::intrusive_ptr, std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x405) [0x69a5c5] 15: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x333) [0x69ab33] 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x86f) [0xbcd1cf] 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xbcf300] 18: (()+0x7dc5) [0x7fc28bc9ddc5] 19: (clone()+0x6d) [0x7fc28a77eced] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. Has anyone ever seen this? Is there a way to fix this? My cluster is in rather large disarray at the moment. I have one of the OSDs now in a restart loop and that is at least preventing other OSDs from going down, but obviously not all other PGs can peer now. I'm not sure what else to do at the moment. Thank you so much, - HP ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS quota
Hi Willi If you are using ceph-fuse, to enable quota, you need to pass "--client-quota" option in the mount operation. Cheers Goncalo From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Willi Fehler [willi.feh...@t-online.de] Sent: 13 August 2016 17:23 To: ceph-users Subject: [ceph-users] CephFS quota Hello, I'm trying to use CephFS quaotas. On my client I've created a subdirectory in my CephFS mountpoint and used the following command from the documentation. setfattr -n ceph.quota.max_bytes -v 1 /mnt/cephfs/quota But if I create files bigger than my quota nothing happens. Do I need a mount option to use Quotas? Regards - Willi ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cascading failure on a placement group
Hi HP. I am just a site admin so my opinion should be validated by proper support staff Seems really similar to http://tracker.ceph.com/issues/14399 The ticket speaks about timezone difference between osds. Maybe it is something worthwhile to check? Cheers Goncalo From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Hein-Pieter van Braam [h...@tmm.cx] Sent: 13 August 2016 21:48 To: ceph-users Subject: [ceph-users] Cascading failure on a placement group Hello all, My cluster started to lose OSDs without any warning, whenever an OSD becomes the primary for a particular PG it crashes with the following stacktrace: ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) 1: /usr/bin/ceph-osd() [0xada722] 2: (()+0xf100) [0x7fc28bca5100] 3: (gsignal()+0x37) [0x7fc28a6bd5f7] 4: (abort()+0x148) [0x7fc28a6bece8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fc28afc29d5] 6: (()+0x5e946) [0x7fc28afc0946] 7: (()+0x5e973) [0x7fc28afc0973] 8: (()+0x5eb93) [0x7fc28afc0b93] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x27a) [0xbddcba] 10: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)+0x75f) [0x87e48f] 11: (ReplicatedPG::hit_set_persist()+0xedb) [0x87f4ab] 12: (ReplicatedPG::do_op(std::tr1::shared_ptr&)+0xe3a) [0x8a0d1a] 13: (ReplicatedPG::do_request(std::tr1::shared_ptr&, ThreadPool::TPHandle&)+0x68a) [0x83be4a] 14: (OSD::dequeue_op(boost::intrusive_ptr, std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x405) [0x69a5c5] 15: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x333) [0x69ab33] 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x86f) [0xbcd1cf] 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xbcf300] 18: (()+0x7dc5) [0x7fc28bc9ddc5] 19: (clone()+0x6d) [0x7fc28a77eced] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. Has anyone ever seen this? Is there a way to fix this? My cluster is in rather large disarray at the moment. I have one of the OSDs now in a restart loop and that is at least preventing other OSDs from going down, but obviously not all other PGs can peer now. I'm not sure what else to do at the moment. Thank you so much, - HP ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cascading failure on a placement group
Hi Goncalo, Thank you for your response. I had already found that issue but it does not apply to my situation. The timezones are correct and I'm running a pure hammer cluster. - HP On Sat, 2016-08-13 at 12:23 +, Goncalo Borges wrote: > Hi HP. > > I am just a site admin so my opinion should be validated by proper > support staff > > Seems really similar to > http://tracker.ceph.com/issues/14399 > > The ticket speaks about timezone difference between osds. Maybe it is > something worthwhile to check? > > Cheers > Goncalo > > > From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of > Hein-Pieter van Braam [h...@tmm.cx] > Sent: 13 August 2016 21:48 > To: ceph-users > Subject: [ceph-users] Cascading failure on a placement group > > Hello all, > > My cluster started to lose OSDs without any warning, whenever an OSD > becomes the primary for a particular PG it crashes with the following > stacktrace: > > ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) > 1: /usr/bin/ceph-osd() [0xada722] > 2: (()+0xf100) [0x7fc28bca5100] > 3: (gsignal()+0x37) [0x7fc28a6bd5f7] > 4: (abort()+0x148) [0x7fc28a6bece8] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fc28afc29d5] > 6: (()+0x5e946) [0x7fc28afc0946] > 7: (()+0x5e973) [0x7fc28afc0973] > 8: (()+0x5eb93) [0x7fc28afc0b93] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x27a) [0xbddcba] > 10: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned > int)+0x75f) [0x87e48f] > 11: (ReplicatedPG::hit_set_persist()+0xedb) [0x87f4ab] > 12: (ReplicatedPG::do_op(std::tr1::shared_ptr&)+0xe3a) > [0x8a0d1a] > 13: (ReplicatedPG::do_request(std::tr1::shared_ptr&, > ThreadPool::TPHandle&)+0x68a) [0x83be4a] > 14: (OSD::dequeue_op(boost::intrusive_ptr, > std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x405) > [0x69a5c5] > 15: (OSD::ShardedOpWQ::_process(unsigned int, > ceph::heartbeat_handle_d*)+0x333) [0x69ab33] > 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned > int)+0x86f) > [0xbcd1cf] > 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xbcf300] > 18: (()+0x7dc5) [0x7fc28bc9ddc5] > 19: (clone()+0x6d) [0x7fc28a77eced] > NOTE: a copy of the executable, or `objdump -rdS ` is > needed to interpret this. > > Has anyone ever seen this? Is there a way to fix this? My cluster is > in > rather large disarray at the moment. I have one of the OSDs now in a > restart loop and that is at least preventing other OSDs from going > down, but obviously not all other PGs can peer now. > > I'm not sure what else to do at the moment. > > Thank you so much, > > - HP > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cascading failure on a placement group
The ticket I mentioned earlier was marked as a duplicate of http://tracker.ceph.com/issues/9732 Cheers Goncalo From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Goncalo Borges [goncalo.bor...@sydney.edu.au] Sent: 13 August 2016 22:23 To: Hein-Pieter van Braam; ceph-users Subject: Re: [ceph-users] Cascading failure on a placement group Hi HP. I am just a site admin so my opinion should be validated by proper support staff Seems really similar to http://tracker.ceph.com/issues/14399 The ticket speaks about timezone difference between osds. Maybe it is something worthwhile to check? Cheers Goncalo From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Hein-Pieter van Braam [h...@tmm.cx] Sent: 13 August 2016 21:48 To: ceph-users Subject: [ceph-users] Cascading failure on a placement group Hello all, My cluster started to lose OSDs without any warning, whenever an OSD becomes the primary for a particular PG it crashes with the following stacktrace: ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) 1: /usr/bin/ceph-osd() [0xada722] 2: (()+0xf100) [0x7fc28bca5100] 3: (gsignal()+0x37) [0x7fc28a6bd5f7] 4: (abort()+0x148) [0x7fc28a6bece8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fc28afc29d5] 6: (()+0x5e946) [0x7fc28afc0946] 7: (()+0x5e973) [0x7fc28afc0973] 8: (()+0x5eb93) [0x7fc28afc0b93] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x27a) [0xbddcba] 10: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)+0x75f) [0x87e48f] 11: (ReplicatedPG::hit_set_persist()+0xedb) [0x87f4ab] 12: (ReplicatedPG::do_op(std::tr1::shared_ptr&)+0xe3a) [0x8a0d1a] 13: (ReplicatedPG::do_request(std::tr1::shared_ptr&, ThreadPool::TPHandle&)+0x68a) [0x83be4a] 14: (OSD::dequeue_op(boost::intrusive_ptr, std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x405) [0x69a5c5] 15: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x333) [0x69ab33] 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x86f) [0xbcd1cf] 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xbcf300] 18: (()+0x7dc5) [0x7fc28bc9ddc5] 19: (clone()+0x6d) [0x7fc28a77eced] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. Has anyone ever seen this? Is there a way to fix this? My cluster is in rather large disarray at the moment. I have one of the OSDs now in a restart loop and that is at least preventing other OSDs from going down, but obviously not all other PGs can peer now. I'm not sure what else to do at the moment. Thank you so much, - HP ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cascading failure on a placement group
Hi, The timezones on all my systems appear to be the same, I just verified it by running 'date' on all my boxes. - HP On Sat, 2016-08-13 at 12:36 +, Goncalo Borges wrote: > The ticket I mentioned earlier was marked as a duplicate of > > http://tracker.ceph.com/issues/9732 > > Cheers > Goncalo > > From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of > Goncalo Borges [goncalo.bor...@sydney.edu.au] > Sent: 13 August 2016 22:23 > To: Hein-Pieter van Braam; ceph-users > Subject: Re: [ceph-users] Cascading failure on a placement group > > Hi HP. > > I am just a site admin so my opinion should be validated by proper > support staff > > Seems really similar to > http://tracker.ceph.com/issues/14399 > > The ticket speaks about timezone difference between osds. Maybe it is > something worthwhile to check? > > Cheers > Goncalo > > > From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of > Hein-Pieter van Braam [h...@tmm.cx] > Sent: 13 August 2016 21:48 > To: ceph-users > Subject: [ceph-users] Cascading failure on a placement group > > Hello all, > > My cluster started to lose OSDs without any warning, whenever an OSD > becomes the primary for a particular PG it crashes with the following > stacktrace: > > ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) > 1: /usr/bin/ceph-osd() [0xada722] > 2: (()+0xf100) [0x7fc28bca5100] > 3: (gsignal()+0x37) [0x7fc28a6bd5f7] > 4: (abort()+0x148) [0x7fc28a6bece8] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fc28afc29d5] > 6: (()+0x5e946) [0x7fc28afc0946] > 7: (()+0x5e973) [0x7fc28afc0973] > 8: (()+0x5eb93) [0x7fc28afc0b93] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x27a) [0xbddcba] > 10: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned > int)+0x75f) [0x87e48f] > 11: (ReplicatedPG::hit_set_persist()+0xedb) [0x87f4ab] > 12: (ReplicatedPG::do_op(std::tr1::shared_ptr&)+0xe3a) > [0x8a0d1a] > 13: (ReplicatedPG::do_request(std::tr1::shared_ptr&, > ThreadPool::TPHandle&)+0x68a) [0x83be4a] > 14: (OSD::dequeue_op(boost::intrusive_ptr, > std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x405) > [0x69a5c5] > 15: (OSD::ShardedOpWQ::_process(unsigned int, > ceph::heartbeat_handle_d*)+0x333) [0x69ab33] > 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned > int)+0x86f) > [0xbcd1cf] > 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xbcf300] > 18: (()+0x7dc5) [0x7fc28bc9ddc5] > 19: (clone()+0x6d) [0x7fc28a77eced] > NOTE: a copy of the executable, or `objdump -rdS ` is > needed to interpret this. > > Has anyone ever seen this? Is there a way to fix this? My cluster is > in > rather large disarray at the moment. I have one of the OSDs now in a > restart loop and that is at least preventing other OSDs from going > down, but obviously not all other PGs can peer now. > > I'm not sure what else to do at the moment. > > Thank you so much, > > - HP > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Multiple OSD crashing a lot
Hi Blade, I appear to be stuck in the same situation you were in. Do you still happen to have a patch to implement this workaround you described? Thanks, - HP ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Multiple OSD crashing a lot
Hi HP. Mine was not really a fix, it was just a hack to get the OSD up long enough to make sure I had a full backup, then I rebuilt the cluster from scratch and restored the data. Though the hack did stop the OSD from crashing, it is probably a symptom of some internal problem, and may not be "safe" to run like that in the long term. The change was something like this: Ref: https://github.com/ceph/ceph/blob/master/src/osd/ReplicatedPG.cc I changed this: ObjectContextRef obc = get_object_context(oid, false); assert(obc); --ctx->delta_stats.num_objects; --ctx->delta_stats. num_objects_hit_set_archive; ctx->delta_stats.num_bytes -= obc->obs.oi.size; ctx->delta_stats.num_bytes_hit_set_archive -= obc->obs.oi.size; to this: ObjectContextRef obc = 0; // get_object_context(oid, false); assert(obc); --ctx->delta_stats.num_objects; --ctx->delta_stats. num_objects_hit_set_archive; if( obc) { ctx->delta_stats.num_bytes -= obc->obs.oi.size; ctx->delta_stats.num_bytes_hit_set_archive -= obc->obs.oi.size; } Good luck! Blade. On Sat, Aug 13, 2016 at 5:52 AM, Hein-Pieter van Braam wrote: > Hi Blade, > > I appear to be stuck in the same situation you were in. Do you still > happen to have a patch to implement this workaround you described? > > Thanks, > > - HP > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Multiple OSD crashing a lot
Hi Blade, I was planning to do something similar. Run the OSD in the way you describe, use object copy to copy the data to a new volume, then move the clients to the new volume. Thanks a lot, - HP On Sat, 2016-08-13 at 08:18 -0700, Blade Doyle wrote: > Hi HP. > > Mine was not really a fix, it was just a hack to get the OSD up long > enough to make sure I had a full backup, then I rebuilt the cluster > from scratch and restored the data. Though the hack did stop the OSD > from crashing, it is probably a symptom of some internal problem, and > may not be "safe" to run like that in the long term. > > The change was something like this: > > Ref: https://github.com/ceph/ceph/blob/master/src/osd/ReplicatedPG.c > c > > I changed this: > > ObjectContextRef obc = get_object_context(oid, false); assert(obc); > --ctx->delta_stats.num_objects; --ctx- > >delta_stats.num_objects_hit_set_archive; ctx->delta_stats.num_bytes > -= obc->obs.oi.size; ctx->delta_stats.num_bytes_hit_set_archive -= > obc->obs.oi.size; > > to this: > > ObjectContextRef obc = 0; // get_object_context(oid, false); > assert(obc); --ctx->delta_stats.num_objects; --ctx- > >delta_stats.num_objects_hit_set_archive; > if( obc) > { > ctx->delta_stats.num_bytes -= obc->obs.oi.size; > ctx->delta_stats.num_bytes_hit_set_archive -= obc->obs.oi.size; > } > > > Good luck! > Blade. > > > On Sat, Aug 13, 2016 at 5:52 AM, Hein-Pieter van Braam > wrote: > > Hi Blade, > > > > I appear to be stuck in the same situation you were in. Do you > > still > > happen to have a patch to implement this workaround you described? > > > > Thanks, > > > > - HP > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Scst-devel] Thin Provisioning and Ceph RBD's
On Mon, Aug 8, 2016 at 7:56 AM, Ilya Dryomov wrote: > On Sun, Aug 7, 2016 at 7:57 PM, Alex Gorbachev > wrote: >>> I'm confused. How can a 4M discard not free anything? It's either >>> going to hit an entire object or two adjacent objects, truncating the >>> tail of one and zeroing the head of another. Using rbd diff: >>> >>> $ rbd diff test | grep -A 1 25165824 >>> 25165824 4194304 data >>> 29360128 4194304 data >>> >>> # a 4M discard at 1M into a RADOS object >>> $ blkdiscard -o $((25165824 + (1 << 20))) -l $((4 << 20)) /dev/rbd0 >>> >>> $ rbd diff test | grep -A 1 25165824 >>> 25165824 1048576 data >>> 29360128 4194304 data >> >> I have tested this on a small RBD device with such offsets and indeed, >> the discard works as you describe, Ilya. >> >> Looking more into why ESXi's discard is not working. I found this >> message in kern.log on Ubuntu on creation of the SCST LUN, which shows >> unmap_alignment 0: >> >> Aug 6 22:02:33 e1 kernel: [300378.136765] virt_id 33 (p_iSCSILun_sclun945) >> Aug 6 22:02:33 e1 kernel: [300378.136782] dev_vdisk: Auto enable thin >> provisioning for device /dev/rbd/spin1/unmap1t >> Aug 6 22:02:33 e1 kernel: [300378.136784] unmap_gran 8192, >> unmap_alignment 0, max_unmap_lba 8192, discard_zeroes_data 1 >> Aug 6 22:02:33 e1 kernel: [300378.136786] dev_vdisk: Attached SCSI >> target virtual disk p_iSCSILun_sclun945 >> (file="/dev/rbd/spin1/unmap1t", fs=409600MB, bs=512, >> nblocks=838860800, cyln=409600) >> Aug 6 22:02:33 e1 kernel: [300378.136847] [4682]: >> scst_alloc_add_tgt_dev:5287:Device p_iSCSILun_sclun945 on SCST lun=32 >> Aug 6 22:02:33 e1 kernel: [300378.136853] [4682]: scst: >> scst_alloc_set_UA:12711:Queuing new UA 8810251f3a90 (6:29:0, >> d_sense 0) to tgt_dev 88102583ad00 (dev p_iSCSILun_sclun945, >> initiator copy_manager_sess) >> >> even though: >> >> root@e1:/sys/block/rbd29# cat discard_alignment >> 4194304 >> >> So somehow the discard_alignment is not making it into the LUN. Could >> this be the issue? > > No, if you are not seeing *any* effect, the alignment is pretty much > irrelevant. Can you do the following on a small test image? > > - capture "rbd diff" output > - blktrace -d /dev/rbd0 -o - | blkparse -i - -o rbd0.trace > - issue a few discards with blkdiscard > - issue a few unmaps with ESXi, preferrably with SCST debugging enabled > - capture "rbd diff" output again > > and attach all of the above? (You might need to install a blktrace > package.) > Latest results from VMWare validation tests: Each test creates and deletes a virtual disk, then calls ESXi unmap for what ESXi maps to that volume: Test 1: 10GB reclaim, rbd diff size: 3GB, discards: 4829 Test 2: 100GB reclaim, rbd diff size: 50GB, discards: 197837 Test 3: 175GB reclaim, rbd diff size: 47 GB, discards: 197824 Test 4: 250GB reclaim, rbd diff size: 125GB, discards: 197837 Test 5: 250GB reclaim, rbd diff size: 80GB, discards: 197837 At the end, the compounded used size via rbd diff is 608 GB from 775GB of data. So we release only about 20% via discards in the end. Thank you, Alex ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Scst-devel] Thin Provisioning and Ceph RBD's
On Sat, Aug 13, 2016 at 12:36 PM, Alex Gorbachev wrote: > On Mon, Aug 8, 2016 at 7:56 AM, Ilya Dryomov wrote: >> On Sun, Aug 7, 2016 at 7:57 PM, Alex Gorbachev >> wrote: I'm confused. How can a 4M discard not free anything? It's either going to hit an entire object or two adjacent objects, truncating the tail of one and zeroing the head of another. Using rbd diff: $ rbd diff test | grep -A 1 25165824 25165824 4194304 data 29360128 4194304 data # a 4M discard at 1M into a RADOS object $ blkdiscard -o $((25165824 + (1 << 20))) -l $((4 << 20)) /dev/rbd0 $ rbd diff test | grep -A 1 25165824 25165824 1048576 data 29360128 4194304 data >>> >>> I have tested this on a small RBD device with such offsets and indeed, >>> the discard works as you describe, Ilya. >>> >>> Looking more into why ESXi's discard is not working. I found this >>> message in kern.log on Ubuntu on creation of the SCST LUN, which shows >>> unmap_alignment 0: >>> >>> Aug 6 22:02:33 e1 kernel: [300378.136765] virt_id 33 (p_iSCSILun_sclun945) >>> Aug 6 22:02:33 e1 kernel: [300378.136782] dev_vdisk: Auto enable thin >>> provisioning for device /dev/rbd/spin1/unmap1t >>> Aug 6 22:02:33 e1 kernel: [300378.136784] unmap_gran 8192, >>> unmap_alignment 0, max_unmap_lba 8192, discard_zeroes_data 1 >>> Aug 6 22:02:33 e1 kernel: [300378.136786] dev_vdisk: Attached SCSI >>> target virtual disk p_iSCSILun_sclun945 >>> (file="/dev/rbd/spin1/unmap1t", fs=409600MB, bs=512, >>> nblocks=838860800, cyln=409600) >>> Aug 6 22:02:33 e1 kernel: [300378.136847] [4682]: >>> scst_alloc_add_tgt_dev:5287:Device p_iSCSILun_sclun945 on SCST lun=32 >>> Aug 6 22:02:33 e1 kernel: [300378.136853] [4682]: scst: >>> scst_alloc_set_UA:12711:Queuing new UA 8810251f3a90 (6:29:0, >>> d_sense 0) to tgt_dev 88102583ad00 (dev p_iSCSILun_sclun945, >>> initiator copy_manager_sess) >>> >>> even though: >>> >>> root@e1:/sys/block/rbd29# cat discard_alignment >>> 4194304 >>> >>> So somehow the discard_alignment is not making it into the LUN. Could >>> this be the issue? >> >> No, if you are not seeing *any* effect, the alignment is pretty much >> irrelevant. Can you do the following on a small test image? >> >> - capture "rbd diff" output >> - blktrace -d /dev/rbd0 -o - | blkparse -i - -o rbd0.trace >> - issue a few discards with blkdiscard >> - issue a few unmaps with ESXi, preferrably with SCST debugging enabled >> - capture "rbd diff" output again >> >> and attach all of the above? (You might need to install a blktrace >> package.) >> > > Latest results from VMWare validation tests: > > Each test creates and deletes a virtual disk, then calls ESXi unmap > for what ESXi maps to that volume: > > Test 1: 10GB reclaim, rbd diff size: 3GB, discards: 4829 > > Test 2: 100GB reclaim, rbd diff size: 50GB, discards: 197837 > > Test 3: 175GB reclaim, rbd diff size: 47 GB, discards: 197824 > > Test 4: 250GB reclaim, rbd diff size: 125GB, discards: 197837 > > Test 5: 250GB reclaim, rbd diff size: 80GB, discards: 197837 > > At the end, the compounded used size via rbd diff is 608 GB from 775GB > of data. So we release only about 20% via discards in the end. Ilya has analyzed the discard pattern, and indeed the problem is that ESXi appears to disregard the discard alignment attribute. Therefore, discards are shifted by 1M, and are not hitting the tail of objects. Discards work much better on the EagerZeroedThick volumes, likely due to contiguous data. I will proceed with the rest of testing, and will post any tips or best practice results as they become available. Thank you for everyone's help and advice! Alex ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cascading failure on a placement group
Hi HP My 2 cents again. In > http://tracker.ceph.com/issues/9732 There is a comment from Samuel saying "This...is not resolved! The utime_t->hobject_t mapping is timezone dependent. Needs to be not timezone dependent when generating the archive object names." The way I read it is that you will get problems if at a given time your timezone has been different (since it is used for archive object names) even if now everything is now in the same timezone. So I guess it could be worthwhile to check if, around the time of the first failures, your timezone wasn't different even if now is ok. It should be worthwhile to check if timezone is/was different in mind. Cheers From: Hein-Pieter van Braam [h...@tmm.cx] Sent: 13 August 2016 22:42 To: Goncalo Borges; ceph-users Subject: Re: [ceph-users] Cascading failure on a placement group Hi, The timezones on all my systems appear to be the same, I just verified it by running 'date' on all my boxes. - HP On Sat, 2016-08-13 at 12:36 +, Goncalo Borges wrote: > The ticket I mentioned earlier was marked as a duplicate of > > http://tracker.ceph.com/issues/9732 > > Cheers > Goncalo > > From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of > Goncalo Borges [goncalo.bor...@sydney.edu.au] > Sent: 13 August 2016 22:23 > To: Hein-Pieter van Braam; ceph-users > Subject: Re: [ceph-users] Cascading failure on a placement group > > Hi HP. > > I am just a site admin so my opinion should be validated by proper > support staff > > Seems really similar to > http://tracker.ceph.com/issues/14399 > > The ticket speaks about timezone difference between osds. Maybe it is > something worthwhile to check? > > Cheers > Goncalo > > > From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of > Hein-Pieter van Braam [h...@tmm.cx] > Sent: 13 August 2016 21:48 > To: ceph-users > Subject: [ceph-users] Cascading failure on a placement group > > Hello all, > > My cluster started to lose OSDs without any warning, whenever an OSD > becomes the primary for a particular PG it crashes with the following > stacktrace: > > ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) > 1: /usr/bin/ceph-osd() [0xada722] > 2: (()+0xf100) [0x7fc28bca5100] > 3: (gsignal()+0x37) [0x7fc28a6bd5f7] > 4: (abort()+0x148) [0x7fc28a6bece8] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fc28afc29d5] > 6: (()+0x5e946) [0x7fc28afc0946] > 7: (()+0x5e973) [0x7fc28afc0973] > 8: (()+0x5eb93) [0x7fc28afc0b93] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x27a) [0xbddcba] > 10: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned > int)+0x75f) [0x87e48f] > 11: (ReplicatedPG::hit_set_persist()+0xedb) [0x87f4ab] > 12: (ReplicatedPG::do_op(std::tr1::shared_ptr&)+0xe3a) > [0x8a0d1a] > 13: (ReplicatedPG::do_request(std::tr1::shared_ptr&, > ThreadPool::TPHandle&)+0x68a) [0x83be4a] > 14: (OSD::dequeue_op(boost::intrusive_ptr, > std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x405) > [0x69a5c5] > 15: (OSD::ShardedOpWQ::_process(unsigned int, > ceph::heartbeat_handle_d*)+0x333) [0x69ab33] > 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned > int)+0x86f) > [0xbcd1cf] > 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xbcf300] > 18: (()+0x7dc5) [0x7fc28bc9ddc5] > 19: (clone()+0x6d) [0x7fc28a77eced] > NOTE: a copy of the executable, or `objdump -rdS ` is > needed to interpret this. > > Has anyone ever seen this? Is there a way to fix this? My cluster is > in > rather large disarray at the moment. I have one of the OSDs now in a > restart loop and that is at least preventing other OSDs from going > down, but obviously not all other PGs can peer now. > > I'm not sure what else to do at the moment. > > Thank you so much, > > - HP > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cascading failure on a placement group
>It should be worthwhile to check if timezone is/was different in mind. What I meant was that it should be worthwhile to check if timezone is/was different in MONS also. Cheers From: Hein-Pieter van Braam [h...@tmm.cx] Sent: 13 August 2016 22:42 To: Goncalo Borges; ceph-users Subject: Re: [ceph-users] Cascading failure on a placement group Hi, The timezones on all my systems appear to be the same, I just verified it by running 'date' on all my boxes. - HP On Sat, 2016-08-13 at 12:36 +, Goncalo Borges wrote: > The ticket I mentioned earlier was marked as a duplicate of > > http://tracker.ceph.com/issues/9732 > > Cheers > Goncalo > > From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of > Goncalo Borges [goncalo.bor...@sydney.edu.au] > Sent: 13 August 2016 22:23 > To: Hein-Pieter van Braam; ceph-users > Subject: Re: [ceph-users] Cascading failure on a placement group > > Hi HP. > > I am just a site admin so my opinion should be validated by proper > support staff > > Seems really similar to > http://tracker.ceph.com/issues/14399 > > The ticket speaks about timezone difference between osds. Maybe it is > something worthwhile to check? > > Cheers > Goncalo > > > From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of > Hein-Pieter van Braam [h...@tmm.cx] > Sent: 13 August 2016 21:48 > To: ceph-users > Subject: [ceph-users] Cascading failure on a placement group > > Hello all, > > My cluster started to lose OSDs without any warning, whenever an OSD > becomes the primary for a particular PG it crashes with the following > stacktrace: > > ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) > 1: /usr/bin/ceph-osd() [0xada722] > 2: (()+0xf100) [0x7fc28bca5100] > 3: (gsignal()+0x37) [0x7fc28a6bd5f7] > 4: (abort()+0x148) [0x7fc28a6bece8] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fc28afc29d5] > 6: (()+0x5e946) [0x7fc28afc0946] > 7: (()+0x5e973) [0x7fc28afc0973] > 8: (()+0x5eb93) [0x7fc28afc0b93] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x27a) [0xbddcba] > 10: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned > int)+0x75f) [0x87e48f] > 11: (ReplicatedPG::hit_set_persist()+0xedb) [0x87f4ab] > 12: (ReplicatedPG::do_op(std::tr1::shared_ptr&)+0xe3a) > [0x8a0d1a] > 13: (ReplicatedPG::do_request(std::tr1::shared_ptr&, > ThreadPool::TPHandle&)+0x68a) [0x83be4a] > 14: (OSD::dequeue_op(boost::intrusive_ptr, > std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x405) > [0x69a5c5] > 15: (OSD::ShardedOpWQ::_process(unsigned int, > ceph::heartbeat_handle_d*)+0x333) [0x69ab33] > 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned > int)+0x86f) > [0xbcd1cf] > 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xbcf300] > 18: (()+0x7dc5) [0x7fc28bc9ddc5] > 19: (clone()+0x6d) [0x7fc28a77eced] > NOTE: a copy of the executable, or `objdump -rdS ` is > needed to interpret this. > > Has anyone ever seen this? Is there a way to fix this? My cluster is > in > rather large disarray at the moment. I have one of the OSDs now in a > restart loop and that is at least preventing other OSDs from going > down, but obviously not all other PGs can peer now. > > I'm not sure what else to do at the moment. > > Thank you so much, > > - HP > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fwd: lost power. monitors died. Cephx errors now
So with a patched leveldb to skip errors I now have a store.db that I can extract the pg,mon,and osd map from. That said when I try to start kh10-8 it bombs out:: --- --- root@kh10-8:/var/lib/ceph/mon/ceph-kh10-8# ceph-mon -i $(hostname) -d 2016-08-13 22:30:54.596039 7fa8b9e088c0 0 ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mon, pid 708653 starting mon.kh10-8 rank 2 at 10.64.64.125:6789/0 mon_data /var/lib/ceph/mon/ceph-kh10-8 fsid e452874b-cb29-4468-ac7f-f8901dfccebf 2016-08-13 22:30:54.608150 7fa8b9e088c0 0 starting mon.kh10-8 rank 2 at 10.64.64.125:6789/0 mon_data /var/lib/ceph/mon/ceph-kh10-8 fsid e452874b-cb29-4468-ac7f-f8901dfccebf 2016-08-13 22:30:54.608395 7fa8b9e088c0 1 mon.kh10-8@-1(probing) e1 preinit fsid e452874b-cb29-4468-ac7f-f8901dfccebf 2016-08-13 22:30:54.608617 7fa8b9e088c0 1 mon.kh10-8@-1(probing).paxosservice(pgmap 0..35606392) refresh upgraded, format 0 -> 1 2016-08-13 22:30:54.608629 7fa8b9e088c0 1 mon.kh10-8@-1(probing).pg v0 on_upgrade discarding in-core PGMap terminate called after throwing an instance of 'ceph::buffer::end_of_buffer' what(): buffer::end_of_buffer *** Caught signal (Aborted) ** in thread 7fa8b9e088c0 ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) 1: ceph-mon() [0x9b25ea] 2: (()+0x10330) [0x7fa8b8f0b330] 3: (gsignal()+0x37) [0x7fa8b73a8c37] 4: (abort()+0x148) [0x7fa8b73ac028] 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fa8b7cb3535] 6: (()+0x5e6d6) [0x7fa8b7cb16d6] 7: (()+0x5e703) [0x7fa8b7cb1703] 8: (()+0x5e922) [0x7fa8b7cb1922] 9: ceph-mon() [0x853c39] 10: (object_stat_collection_t::decode(ceph::buffer::list::iterator&)+0x167) [0x894227] 11: (pg_stat_t::decode(ceph::buffer::list::iterator&)+0x5ff) [0x894baf] 12: (PGMap::update_pg(pg_t, ceph::buffer::list&)+0xa3) [0x91a8d3] 13: (PGMonitor::read_pgmap_full()+0x1d8) [0x68b9b8] 14: (PGMonitor::update_from_paxos(bool*)+0xbf7) [0x6977b7] 15: (PaxosService::refresh(bool*)+0x19a) [0x605b5a] 16: (Monitor::refresh_from_paxos(bool*)+0x1db) [0x5b1ffb] 17: (Monitor::init_paxos()+0x85) [0x5b2365] 18: (Monitor::preinit()+0x7d7) [0x5b6f87] 19: (main()+0x230c) [0x57853c] 20: (__libc_start_main()+0xf5) [0x7fa8b7393f45] 21: ceph-mon() [0x59a3c7] 2016-08-13 22:30:54.611791 7fa8b9e088c0 -1 *** Caught signal (Aborted) ** in thread 7fa8b9e088c0 ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) 1: ceph-mon() [0x9b25ea] 2: (()+0x10330) [0x7fa8b8f0b330] 3: (gsignal()+0x37) [0x7fa8b73a8c37] 4: (abort()+0x148) [0x7fa8b73ac028] 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fa8b7cb3535] 6: (()+0x5e6d6) [0x7fa8b7cb16d6] 7: (()+0x5e703) [0x7fa8b7cb1703] 8: (()+0x5e922) [0x7fa8b7cb1922] 9: ceph-mon() [0x853c39] 10: (object_stat_collection_t::decode(ceph::buffer::list::iterator&)+0x167) [0x894227] 11: (pg_stat_t::decode(ceph::buffer::list::iterator&)+0x5ff) [0x894baf] 12: (PGMap::update_pg(pg_t, ceph::buffer::list&)+0xa3) [0x91a8d3] 13: (PGMonitor::read_pgmap_full()+0x1d8) [0x68b9b8] 14: (PGMonitor::update_from_paxos(bool*)+0xbf7) [0x6977b7] 15: (PaxosService::refresh(bool*)+0x19a) [0x605b5a] 16: (Monitor::refresh_from_paxos(bool*)+0x1db) [0x5b1ffb] 17: (Monitor::init_paxos()+0x85) [0x5b2365] 18: (Monitor::preinit()+0x7d7) [0x5b6f87] 19: (main()+0x230c) [0x57853c] 20: (__libc_start_main()+0xf5) [0x7fa8b7393f45] 21: ceph-mon() [0x59a3c7] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. --- begin dump of recent events --- -33> 2016-08-13 22:30:54.593450 7fa8b9e088c0 5 asok(0x36a20f0) register_command perfcounters_dump hook 0x365a050 -32> 2016-08-13 22:30:54.593480 7fa8b9e088c0 5 asok(0x36a20f0) register_command 1 hook 0x365a050 -31> 2016-08-13 22:30:54.593486 7fa8b9e088c0 5 asok(0x36a20f0) register_command perf dump hook 0x365a050 -30> 2016-08-13 22:30:54.593496 7fa8b9e088c0 5 asok(0x36a20f0) register_command perfcounters_schema hook 0x365a050 -29> 2016-08-13 22:30:54.593499 7fa8b9e088c0 5 asok(0x36a20f0) register_command 2 hook 0x365a050 -28> 2016-08-13 22:30:54.593501 7fa8b9e088c0 5 asok(0x36a20f0) register_command perf schema hook 0x365a050 -27> 2016-08-13 22:30:54.593503 7fa8b9e088c0 5 asok(0x36a20f0) register_command perf reset hook 0x365a050 -26> 2016-08-13 22:30:54.593505 7fa8b9e088c0 5 asok(0x36a20f0) register_command config show hook 0x365a050 -25> 2016-08-13 22:30:54.593508 7fa8b9e088c0 5 asok(0x36a20f0) register_command config set hook 0x365a050 -24> 2016-08-13 22:30:54.593510 7fa8b9e088c0 5 asok(0x36a20f0) register_command config get hook 0x365a050 -23> 2016-08-13 22:30:54.593512 7fa8b9e088c0 5 asok(0x36a20f0) register_command config diff hook 0x365a050 -22> 2016-08-13 22:30:54.593513 7fa8b9e088c0 5 asok(0x36a20f0) register_command log flush hook 0x365a050 -21> 2016-08-13 22:30:54.593557 7fa8b9e088c0 5 asok(0x36a20f0) register_command log dump h
Re: [ceph-users] CephFS quota
Hello guys, my cluster is running on the latest Ceph version. My cluster and my client are running on CentOS 7.2. ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) My Client is using CephFS, I'm not using Fuse. My fstab: linsrv001,linsrv002,linsrv003:/ /mnt/cephfs ceph noatime,dirstat,_netdev,name=cephfs,secretfile=/etc/ceph/cephfs.secret 0 0 Regards - Willi Am 13.08.16 um 13:58 schrieb Goncalo Borges: Hi Willi If you are using ceph-fuse, to enable quota, you need to pass "--client-quota" option in the mount operation. Cheers Goncalo From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Willi Fehler [willi.feh...@t-online.de] Sent: 13 August 2016 17:23 To: ceph-users Subject: [ceph-users] CephFS quota Hello, I'm trying to use CephFS quaotas. On my client I've created a subdirectory in my CephFS mountpoint and used the following command from the documentation. setfattr -n ceph.quota.max_bytes -v 1 /mnt/cephfs/quota But if I create files bigger than my quota nothing happens. Do I need a mount option to use Quotas? Regards - Willi ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com