Re: [librbd] Add interface of get the snapshot size?
On 03/24/2014 06:50 AM, Andrey Korolyov wrote: On 03/24/2014 05:30 PM, Haomai Wang wrote: Hi all, As we know, snapshot is a lightweight resource in librbd and we doesn't have any statistic informations about it. But it causes some problems to the cloud management. We can't measure the size of snapshot, different snapshot will occur different space. So we don't have way to estimate the resource usage of user. Maybe we can have a counter to record space usage when volumn created. When creating snapshot, the counter is freeze and store as the size of snapshot. New counter will assign to zero for the volume. Any feedback is appreciate! I believe that there is a rough estimation over 'rados df'. Per-image statistics would be awesome, though precise stats will be neither rough too(# of rbd object clones per volume) or introduce new counter mechanism. Dealing with discard for the filestore, it looks even more difficult to calculate right estimation, as with XFS preallocation feature. diff_iterate() will let you see what extents exist in the image at a given snapshot or between snapshots. It's not perfect, but it'll be more accurate than other methods. A better long term solution might be to leverage something like a bitmap of object existence. This could be marked dirty (i.e. unreliable) when opening an image, maintained in memory, and saved with a snapshot and when closing an image as clean. With xfs preallocation hints, the usage will be pretty close to object size * num objects. Josh -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GCC -msse2 portability question
On 23/03/2014 23:34, Laurent GUERBY wrote: > On Sun, 2014-03-23 at 20:50 +0100, Loic Dachary wrote: >> Hi Laurent, >> >> In the context of optimizing erasure code functions implemented by >> Kevin Greenan (cc'ed) and James Plank at >> https://bitbucket.org/jimplank/gf-complete/ we ran accross a question >> you may have the answer to: can gcc -msse2 (or -msse* for that matter >> ) have a negative impact on the portability of the compiled binary >> code ? >> >> In other words, if a code is compiled without -msse* and runs fine on >> all intel processors it targets, could it be that adding -msse* to the >> compilation of the same source code generate a binary that would fail >> on some processors ? This is assuming no sse specific functions were >> used in the source code. >> >> In gf-complete, all sse specific instructions are carefully protected >> to not be run on a CPU that does not support them. The runtime >> detection is done by checking CPU id bits ( see >> https://bitbucket.org/jimplank/gf-complete/pull-request/7/probe-intel-sse-features-at-runtime/diff#Lsrc/gf_intel.cT28 >> ) >> >> The corresponding thread is at: >> >> https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse/diff#comment-1479296 >> >> Cheers >> > > Hi Loic, > > The GCC documentation is here with lists of architecture supporting > sse/sse2: > > http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/i386-and-x86-64-Options.html#i386-and-x86-64-Options > > So unless you want to run your code a very very old x86 32 bit processor > "-msse" shouldn't be an issue. "-msse2" is similar. This is good to know :) Should I be worried about unintended side effects of -msse4.2 -mssse3 -msse4.1 or -mpclmul ? These are the flags that gf-complete are using, specifically. Cheers > > -mtune=xxx with xxx being a recent arch could be interesting for you > because it keeps compatibility with the generic arch while tuning > resulting code on the specific arch (for example the current fashionable > arch like corei7). > > For alibrary you can choose the code you execute a load/run time > for a specific function by using the STT_GNU_IFUNC feature : > > http://vger.kernel.org/~davem/cgi-bin/blog.cgi/2010/02/07 > http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attributes.html#index-g_t_0040code_007bifunc_007d-attribute-2529 > > I believe recent GLIBC use this feature to tune > some performance/arch sensitive functions. > > Sincerely, > > Laurent > > -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature
Re: Limiting specific to specific directory, client separation
This is not currently a priority in Inktank's roadmap for the MDS. :( But we discussed client security in more detail than those tickets during the Dumpling Ceph Developer Summit: http://wiki.ceph.com/Planning/CDS/Dumpling (search for "1G: Client Security for CephFS" -- there's a blueprint, an etherpad, and a video of the discussion) I think that pretty well covers the work involved in doing what you describe, and we would love to support any developers working on this or other features! -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Mon, Mar 24, 2014 at 3:37 AM, Szymon Szypulski wrote: > Hi, > > We would liked to migrate from gluster to ceph in environment where > multiple servers can access one directory, where some should be > unavailable. Basically we need per directory access list for multiple > users. > > I've found some overdue tickets on tracker > http://tracker.ceph.com/issues/1401 and > http://tracker.ceph.com/issues/1237. Is there any chance it will be > included into road map soon? If not, is there anything we can do to > speed it up? Hire C/C++ dev internally and delegate it to ceph > development? Maybe something else? > > --- > Szymon Szypulski > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ceph-users] poor data distribution
Hi, > FWIW the tunable that fixes this was just merged today but won't > appear in a release for another 3 weeks or so. This is "vary_r tunable" ? Can I use this in production? -- Regards Dominik 2014-02-12 3:24 GMT+01:00 Sage Weil : > On Wed, 12 Feb 2014, Dominik Mostowiec wrote: >> Hi, >> If this problem (with stucked active+remapped pgs after >> reweight-by-utilisation) affects all ceph configurations or only >> specific ones? >> If specific: what is the reason in my case? Is this caused by crush >> configuration (cluster architecture, crush tunnables, ...), cluster >> size, architecture design mistakes, or something else? > > It seems to just be the particular structure of your map. In your case > you have a few different racks (or hosts? I forget) in the upper level up > the hierarchy and then a handful of devices in the leaves that are marked > out or reweighted down. With that combination CRUSH runs out of placement > choices at the upper level and keeps trying the same values in the lower > level. FWIW the tunable that fixes this was just merged today but won't > appear in a release for another 3 weeks or so. > >> Second question. >> Distribution PGs on OSDs is better for large clusters (where pg_num is >> higher). It is possible(for small clusters) to chagne crush >> distribution algorithm to more linear? (I realize that it will be less >> efficient). > > It really related to the ratio of pg_num to total OSDs, not the absolute > number. For small clusters it is probably more tolerable to have a larger > pg_num count though because many of the costs normally associated with > that (e.g., more peers) run up against the total host count before they > start to matter. > > Again, I think the right answer here is picking a good pg to osd ratio and > using reweight-by-utilization (which will be fixed soon). > > sage > > >> >> -- >> Regards >> Dominik >> >> 2014-02-06 21:31 GMT+01:00 Dominik Mostowiec : >> > Great! >> > Thanks for Your help. >> > >> > -- >> > Regards >> > Dominik >> > >> > 2014-02-06 21:10 GMT+01:00 Sage Weil : >> >> On Thu, 6 Feb 2014, Dominik Mostowiec wrote: >> >>> Hi, >> >>> Thanks !! >> >>> Can You suggest any workaround for now? >> >> >> >> You can adjust the crush weights on the overfull nodes slightly. You'd >> >> need to do it by hand, but that will do the trick. For example, >> >> >> >> ceph osd crush reweight osd.123 .96 >> >> >> >> (if the current weight is 1.0). >> >> >> >> sage >> >> >> >>> >> >>> -- >> >>> Regards >> >>> Dominik >> >>> >> >>> >> >>> 2014-02-06 18:39 GMT+01:00 Sage Weil : >> >>> > Hi, >> >>> > >> >>> > Just an update here. Another user saw this and after playing with it I >> >>> > identified a problem with CRUSH. There is a branch outstanding >> >>> > (wip-crush) that is pending review, but it's not a quick fix because of >> >>> > compatibility issues. >> >>> > >> >>> > sage >> >>> > >> >>> > >> >>> > On Thu, 6 Feb 2014, Dominik Mostowiec wrote: >> >>> > >> >>> >> Hi, >> >>> >> Mabye this info can help to find what is wrong. >> >>> >> For one PG (3.1e4a) which is active+remapped: >> >>> >> { "state": "active+remapped", >> >>> >> "epoch": 96050, >> >>> >> "up": [ >> >>> >> 119, >> >>> >> 69], >> >>> >> "acting": [ >> >>> >> 119, >> >>> >> 69, >> >>> >> 7], >> >>> >> Logs: >> >>> >> On osd.7: >> >>> >> 2014-02-04 09:45:54.966913 7fa618afe700 1 osd.7 pg_epoch: 94460 >> >>> >> pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486 >> >>> >> n=6718 ec=4 les/c 93486/93486 94460/94460/92233) [119,69] r=-1 >> >>> >> lpr=94460 pi=92546-94459/5 lcod 94459'207003 inactive NOTIFY] >> >>> >> state: transitioning to Stray >> >>> >> 2014-02-04 09:45:55.781278 7fa6172fb700 1 osd.7 pg_epoch: 94461 >> >>> >> pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486 >> >>> >> n=6718 ec=4 les/c 93486/93486 94460/94461/92233) >> >>> >> [119,69]/[119,69,7,142] r=2 lpr=94461 pi=92546-94460/6 lcod >> >>> >> 94459'207003 remapped NOTIFY] state: transitioning to Stray >> >>> >> 2014-02-04 09:49:01.124510 7fa618afe700 1 osd.7 pg_epoch: 94495 >> >>> >> pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=94462 >> >>> >> n=6718 ec=4 les/c 94462/94494 94460/94495/92233) [119,69]/[119,69,7] >> >>> >> r=2 lpr=94495 pi=92546-94494/7 lcod 94459'207003 remapped] >> >>> >> state: transitioning to Stray >> >>> >> >> >>> >> On osd.119: >> >>> >> 2014-02-04 09:45:54.981707 7f37f07c5700 1 osd.119 pg_epoch: 94460 >> >>> >> pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486 >> >>> >> n=6718 ec=4 les/c 93486/93486 94460/94460/92233) [119,69] r=0 >> >>> >> lpr=94460 pi=93485-94459/1 mlcod 0'0 inactive] state: >> >>> >> transitioning to Primary >> >>> >> 2014-02-04 09:45:55.805712 7f37ecfbe700 1 osd.119 pg_epoch: 94461 >> >>> >> pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486 >> >>> >> n=6718 ec=4 les/c 93486/93486 94460/94461/92233) >> >>> >> [119,69]/[119,69,
ceph branch status
-- All Branches -- Alfredo Deza 2013-09-27 10:33:52 -0400 wip-5900 2014-01-21 16:29:22 -0500 wip-6465 Dan Mick 2013-07-16 23:00:06 -0700 wip-5634 2014-03-18 21:56:03 -0700 wip-7577 Danny Al-Gaaf 2014-03-06 01:37:05 +0100 wip-da-coverity-20140306 2014-03-15 00:21:44 +0100 wip-da-SCA-fixes-20140314 2014-03-19 14:15:45 +0100 wip-6951-dumpling David Zafman 2014-03-21 15:13:41 -0700 wip-dz-watch-test 2014-03-21 19:12:11 -0700 wip-7659 Gary Lowell 2013-02-05 19:29:11 -0800 wip.cppchecker 2013-03-01 18:55:35 -0800 wip-da-spec-1 2013-07-23 17:00:07 -0700 wip-build-test 2013-09-27 11:46:26 -0700 wip-6020 2013-10-10 12:54:51 -0700 wip-5900-gl Greg Farnum 2013-02-13 14:46:38 -0800 wip-mds-snap-fix 2013-02-22 19:57:53 -0800 wip-4248-snapid-journaling 2013-09-19 18:15:00 -0700 wip-4354 2013-10-09 13:31:38 -0700 cuttlefish-4832 2013-11-15 14:41:51 -0800 wip-librados-command2 2013-12-09 16:21:41 -0800 wip-hitset-snapshots 2014-01-29 08:44:01 -0800 wip-filestore-fast-lookup 2014-01-29 15:45:02 -0800 wip-striper-warning 2014-02-19 11:32:08 -0800 wip-messenger-locking 2014-02-28 11:12:51 -0800 wip-7542-2 2014-03-13 16:02:17 -0700 wip-4354-mds-optracker 2014-03-13 16:02:17 -0700 wip-4354-shared_ptr 2014-03-14 17:40:30 -0700 wip-fast-dispatch James Page 2013-02-27 22:50:38 + wip-debhelper-8 Joao Eduardo Luis 2013-04-18 00:01:24 +0100 wip-4521-tool 2013-04-22 15:14:28 +0100 wip-4748 2013-04-24 16:42:11 +0100 wip-4521 2013-04-30 18:45:22 +0100 wip-mon-compact-dbg 2013-05-21 01:46:13 +0100 wip-monstoretool-foo 2013-05-31 16:26:02 +0100 wip-mon-cache-first-last-committed 2013-05-31 21:00:28 +0100 wip-mon-trim-b 2013-07-20 04:30:59 +0100 wip-mon-caps-test 2013-07-23 16:21:46 +0100 wip-5704-cuttlefish 2013-07-23 17:35:59 +0100 wip-5704 2013-08-02 22:54:42 +0100 wip-5648 2013-08-12 11:21:29 -0700 wip-store-tool.cuttlefish 2013-09-25 22:08:24 +0100 wip-6378 2013-10-10 14:06:59 +0100 wip-mon-set-pspool 2013-12-09 16:39:19 + wip-mon-mdsmap-trim.dumpling 2013-12-18 22:17:09 + wip-monstoretool-genmdsmaps 2014-01-17 17:11:59 -0800 wip-fix-pipe-comment-for-fhaas 2014-02-02 14:10:39 + wip-7277.for-loic 2014-03-24 14:43:21 + wip-status-function-names John Spray 2014-03-03 13:10:05 + wip-mds-stop-rank-0 John Spray 2014-02-20 14:41:23 + wip-5382 2014-03-01 17:05:11 + wip-7572 2014-03-06 13:01:25 + wip-mds-debug 2014-03-13 18:33:01 + wip-3863 2014-03-17 15:43:13 + wip-journal-tool John Wilkins 2013-07-31 18:00:50 -0700 wip-doc-rados-python-api Josh Durgin 2013-03-01 14:45:23 -0800 wip-rbd-workunit-debug 2013-07-25 18:44:10 -0700 wip-5488-2 2013-08-14 15:51:04 -0700 wip-5970 2013-08-27 12:03:08 -0700 wip-krbd-workunits 2013-11-22 15:17:08 -0800 wip-zero-copy-bufferlist 2013-11-25 13:59:29 -0800 wip-init-highlander 2013-12-17 08:16:59 -0800 wip-rbd-deadlock-lockdep 2013-12-18 12:28:39 -0800 wip-rbd-deadlock-lockdep-dumpling 2013-12-26 18:06:39 -0800 emperor-5426 2013-12-26 18:07:13 -0800 dumpling-5426 2014-01-16 21:19:47 -0800 wip-objectcacher-flusher-dumpling 2014-02-06 20:31:43 -0800 wip-librados-obj-ops 2014-02-06 20:31:47 -0800 wip-librados-op-rvals 2014-03-03 14:27:39 -0800 wip-object-cacher-memory Ken Dreyer 2014-02-19 22:54:44 + last Loic Dachary 2014-03-19 00:28:17 +0100 wip-jerasure 2014-03-24 15:32:10 +0100 wip-sse-fix Matt Benjamin 2013-10-08 16:49:23 -0400 wip-libcephfs-emp-rb Noah Watkins 2013-01-05 11:58:38 -0800 wip-localized-read-tests 2013-10-18 15:42:50 -0700 cls-lua 2013-11-05 07:30:19 -0800 port/old 2013-11-06 08:39:57 -0800 wip-6636 2013-11-26 08:26:24 -0800 wip-boost-uuid 2013-12-30 09:47:40 -0800 port/visibility 2014-01-13 12:16:22 -0800 fix-sbin-install 2014-02-05 14:23:59 -0800 port/main Roald van Loon 2012-12-24 22:26:56 + wip-dout Sage Weil 2012-11-30 13:47:27 -0800 wip-osd-readhole 2012-12-07 14:38:46 -0800 wip-osd-alloc 2013-01-29 13:46:02 -0800 wip-readdir 2013-02-11 07:05:15 -0800 wip-sim-journal-clone 2013-04-18 13:51:36 -0700 argonaut 2013-06-02 21:21:09 -0700 wip-fuse-bobtail 2013-06-18 17:00:00 -0700 wip-mon-refs 2013-06-28 12:54:08 -0700 wip-mds-snap
Re: [librbd] Add interface of get the snapshot size?
On 03/24/2014 05:30 PM, Haomai Wang wrote: > Hi all, > > As we know, snapshot is a lightweight resource in librbd and we > doesn't have any statistic informations about it. But it causes some > problems to the cloud management. > > We can't measure the size of snapshot, different snapshot will occur > different space. So we don't have way to estimate the resource usage > of user. > > Maybe we can have a counter to record space usage when volumn created. > When creating snapshot, the counter is freeze and store as the size of > snapshot. New counter will assign to zero for the volume. > > Any feedback is appreciate! > I believe that there is a rough estimation over 'rados df'. Per-image statistics would be awesome, though precise stats will be neither rough too(# of rbd object clones per volume) or introduce new counter mechanism. Dealing with discard for the filestore, it looks even more difficult to calculate right estimation, as with XFS preallocation feature. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [librbd] Add interface of get the snapshot size?
On 03/24/2014 02:30 PM, Haomai Wang wrote: Hi all, As we know, snapshot is a lightweight resource in librbd and we doesn't have any statistic informations about it. But it causes some problems to the cloud management. We can't measure the size of snapshot, different snapshot will occur different space. So we don't have way to estimate the resource usage of user. Maybe we can have a counter to record space usage when volumn created. What do you mean with space usage? Cluster wide or pool usage? When creating snapshot, the counter is freeze and store as the size of snapshot. New counter will assign to zero for the volume. Any feedback is appreciate! -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[librbd] Add interface of get the snapshot size?
Hi all, As we know, snapshot is a lightweight resource in librbd and we doesn't have any statistic informations about it. But it causes some problems to the cloud management. We can't measure the size of snapshot, different snapshot will occur different space. So we don't have way to estimate the resource usage of user. Maybe we can have a counter to record space usage when volumn created. When creating snapshot, the counter is freeze and store as the size of snapshot. New counter will assign to zero for the volume. Any feedback is appreciate! -- Best Regards, Wheat -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ceph: don't include ceph.{file,dir}.layout vxattr in listxattr()
This avoids 'cp -a' modifying layout of new files/directories. Signed-off-by: Yan, Zheng --- fs/ceph/xattr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c index 28549d5..c9c2b88 100644 --- a/fs/ceph/xattr.c +++ b/fs/ceph/xattr.c @@ -231,7 +231,7 @@ static struct ceph_vxattr ceph_dir_vxattrs[] = { .name_size = sizeof("ceph.dir.layout"), .getxattr_cb = ceph_vxattrcb_layout, .readonly = false, - .hidden = false, + .hidden = true, .exists_cb = ceph_vxattrcb_layout_exists, }, XATTR_LAYOUT_FIELD(dir, layout, stripe_unit), @@ -258,7 +258,7 @@ static struct ceph_vxattr ceph_file_vxattrs[] = { .name_size = sizeof("ceph.file.layout"), .getxattr_cb = ceph_vxattrcb_layout, .readonly = false, - .hidden = false, + .hidden = true, .exists_cb = ceph_vxattrcb_layout_exists, }, XATTR_LAYOUT_FIELD(file, layout, stripe_unit), -- 1.8.5.3 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Limiting specific to specific directory, client separation
Hi, We would liked to migrate from gluster to ceph in environment where multiple servers can access one directory, where some should be unavailable. Basically we need per directory access list for multiple users. I've found some overdue tickets on tracker http://tracker.ceph.com/issues/1401 and http://tracker.ceph.com/issues/1237. Is there any chance it will be included into road map soon? If not, is there anything we can do to speed it up? Hire C/C++ dev internally and delegate it to ceph development? Maybe something else? --- Szymon Szypulski -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html