Re: [librbd] Add interface of get the snapshot size?

2014-03-24 Thread Josh Durgin

On 03/24/2014 06:50 AM, Andrey Korolyov wrote:

On 03/24/2014 05:30 PM, Haomai Wang wrote:

Hi all,

As we know, snapshot is a lightweight resource in librbd and we
doesn't have any statistic informations about it. But it causes some
problems to the cloud management.

We can't measure the size of snapshot, different snapshot will occur
different space. So we don't have way to estimate the resource usage
of user.

Maybe we can have a counter to record space usage when volumn created.
When creating snapshot, the counter is freeze and store as the size of
snapshot. New counter will assign to zero for the volume.

Any feedback is appreciate!



I believe that there is a rough estimation over 'rados df'. Per-image
statistics would be awesome, though precise stats will be neither rough
too(# of rbd object clones per volume) or introduce new counter
mechanism. Dealing with discard for the filestore, it looks even more
difficult to calculate right estimation, as with XFS preallocation feature.


diff_iterate() will let you see what extents exist in the image at a
given snapshot or between snapshots. It's not perfect, but it'll be more
accurate than other methods.

A better long term solution might be to leverage something like a
bitmap of object existence. This could be marked dirty (i.e.
unreliable) when opening an image, maintained in memory, and saved with
a snapshot and when closing an image as clean.

With xfs preallocation hints, the usage will be pretty close to object
size * num objects.

Josh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GCC -msse2 portability question

2014-03-24 Thread Loic Dachary


On 23/03/2014 23:34, Laurent GUERBY wrote:
> On Sun, 2014-03-23 at 20:50 +0100, Loic Dachary wrote:
>> Hi Laurent,
>>
>> In the context of optimizing erasure code functions implemented by
>> Kevin Greenan (cc'ed) and James Plank at
>> https://bitbucket.org/jimplank/gf-complete/ we ran accross a question
>> you may have the answer to: can gcc -msse2 (or -msse* for that matter
>> ) have a negative impact on the portability of the compiled binary
>> code ? 
>>
>> In other words, if a code is compiled without -msse* and runs fine on
>> all intel processors it targets, could it be that adding -msse* to the
>> compilation of the same source code generate a binary that would fail
>> on some processors ? This is assuming no sse specific functions were
>> used in the source code.
>>
>> In gf-complete, all sse specific instructions are carefully protected
>> to not be run on a CPU that does not support them. The runtime
>> detection is done by checking CPU id bits ( see
>> https://bitbucket.org/jimplank/gf-complete/pull-request/7/probe-intel-sse-features-at-runtime/diff#Lsrc/gf_intel.cT28
>>  )
>>
>> The corresponding thread is at:
>>
>> https://bitbucket.org/jimplank/gf-complete/pull-request/4/defer-the-decision-to-use-a-given-sse/diff#comment-1479296
>>
>> Cheers
>>
> 
> Hi Loic,
> 
> The GCC documentation is here with lists of architecture supporting
> sse/sse2:
> 
> http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/i386-and-x86-64-Options.html#i386-and-x86-64-Options
> 
> So unless you want to run your code a very very old x86 32 bit processor
> "-msse" shouldn't be an issue. "-msse2" is similar.

This is good to know :) Should I be worried about unintended side effects of 
-msse4.2 -mssse3 -msse4.1 or -mpclmul ? These are the flags that gf-complete 
are using, specifically.

Cheers

> 
> -mtune=xxx with xxx being a recent arch could be interesting for you
> because it keeps compatibility with the generic arch while tuning
> resulting code on the specific arch (for example the current fashionable
> arch like corei7).
> 
> For alibrary you can choose the code you execute a load/run time
> for a specific function by using the STT_GNU_IFUNC feature :
> 
> http://vger.kernel.org/~davem/cgi-bin/blog.cgi/2010/02/07
> http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attributes.html#index-g_t_0040code_007bifunc_007d-attribute-2529
> 
> I believe recent GLIBC use this feature to tune
> some performance/arch sensitive functions.
> 
> Sincerely,
> 
> Laurent
> 
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Limiting specific to specific directory, client separation

2014-03-24 Thread Gregory Farnum
This is not currently a priority in Inktank's roadmap for the MDS. :(
But we discussed client security in more detail than those tickets
during the Dumpling Ceph Developer Summit:
http://wiki.ceph.com/Planning/CDS/Dumpling (search for "1G: Client
Security for CephFS" -- there's a blueprint, an etherpad, and a video
of the discussion)

I think that pretty well covers the work involved in doing what you
describe, and we would love to support any developers working on this
or other features!
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Mon, Mar 24, 2014 at 3:37 AM, Szymon Szypulski
 wrote:
> Hi,
>
> We would liked to migrate from gluster to ceph in environment where
> multiple servers can access one directory, where some should be
> unavailable. Basically we need per directory access list for multiple
> users.
>
> I've found some overdue tickets on tracker
> http://tracker.ceph.com/issues/1401 and
> http://tracker.ceph.com/issues/1237. Is there any chance it will be
> included into road map soon? If not, is there anything we can do to
> speed it up? Hire C/C++ dev internally and delegate it to ceph
> development? Maybe something else?
>
> ---
> Szymon Szypulski
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] poor data distribution

2014-03-24 Thread Dominik Mostowiec
Hi,
> FWIW the tunable that fixes this was just merged today but won't
> appear in a release for another 3 weeks or so.
This is "vary_r tunable" ?

Can I use this in production?

--
Regards
Dominik


2014-02-12 3:24 GMT+01:00 Sage Weil :
> On Wed, 12 Feb 2014, Dominik Mostowiec wrote:
>> Hi,
>> If this problem (with stucked active+remapped pgs after
>> reweight-by-utilisation) affects all ceph configurations or only
>> specific ones?
>> If specific: what is the reason in my case? Is this caused by crush
>> configuration (cluster architecture, crush tunnables, ...), cluster
>> size, architecture design mistakes, or something else?
>
> It seems to just be the particular structure of your map.  In your case
> you have a few different racks (or hosts? I forget) in the upper level up
> the hierarchy and then a handful of devices in the leaves that are marked
> out or reweighted down.  With that combination CRUSH runs out of placement
> choices at the upper level and keeps trying the same values in the lower
> level.  FWIW the tunable that fixes this was just merged today but won't
> appear in a release for another 3 weeks or so.
>
>> Second question.
>> Distribution PGs on OSDs is better for large clusters (where pg_num is
>> higher). It is possible(for small clusters) to chagne crush
>> distribution algorithm to more linear? (I realize that it will be less
>> efficient).
>
> It really related to the ratio of pg_num to total OSDs, not the absolute
> number.  For small clusters it is probably more tolerable to have a larger
> pg_num count though because many of the costs normally associated with
> that (e.g., more peers) run up against the total host count before they
> start to matter.
>
> Again, I think the right answer here is picking a good pg to osd ratio and
> using reweight-by-utilization (which will be fixed soon).
>
> sage
>
>
>>
>> --
>> Regards
>> Dominik
>>
>> 2014-02-06 21:31 GMT+01:00 Dominik Mostowiec :
>> > Great!
>> > Thanks for Your help.
>> >
>> > --
>> > Regards
>> > Dominik
>> >
>> > 2014-02-06 21:10 GMT+01:00 Sage Weil :
>> >> On Thu, 6 Feb 2014, Dominik Mostowiec wrote:
>> >>> Hi,
>> >>> Thanks !!
>> >>> Can You suggest any workaround for now?
>> >>
>> >> You can adjust the crush weights on the overfull nodes slightly.  You'd
>> >> need to do it by hand, but that will do the trick.  For example,
>> >>
>> >>   ceph osd crush reweight osd.123 .96
>> >>
>> >> (if the current weight is 1.0).
>> >>
>> >> sage
>> >>
>> >>>
>> >>> --
>> >>> Regards
>> >>> Dominik
>> >>>
>> >>>
>> >>> 2014-02-06 18:39 GMT+01:00 Sage Weil :
>> >>> > Hi,
>> >>> >
>> >>> > Just an update here.  Another user saw this and after playing with it I
>> >>> > identified a problem with CRUSH.  There is a branch outstanding
>> >>> > (wip-crush) that is pending review, but it's not a quick fix because of
>> >>> > compatibility issues.
>> >>> >
>> >>> > sage
>> >>> >
>> >>> >
>> >>> > On Thu, 6 Feb 2014, Dominik Mostowiec wrote:
>> >>> >
>> >>> >> Hi,
>> >>> >> Mabye this info can help to find what is wrong.
>> >>> >> For one PG (3.1e4a) which is active+remapped:
>> >>> >> { "state": "active+remapped",
>> >>> >>   "epoch": 96050,
>> >>> >>   "up": [
>> >>> >> 119,
>> >>> >> 69],
>> >>> >>   "acting": [
>> >>> >> 119,
>> >>> >> 69,
>> >>> >> 7],
>> >>> >> Logs:
>> >>> >> On osd.7:
>> >>> >> 2014-02-04 09:45:54.966913 7fa618afe700  1 osd.7 pg_epoch: 94460
>> >>> >> pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
>> >>> >> n=6718 ec=4 les/c 93486/93486 94460/94460/92233) [119,69] r=-1
>> >>> >> lpr=94460 pi=92546-94459/5 lcod 94459'207003 inactive NOTIFY]
>> >>> >> state: transitioning to Stray
>> >>> >> 2014-02-04 09:45:55.781278 7fa6172fb700  1 osd.7 pg_epoch: 94461
>> >>> >> pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
>> >>> >> n=6718 ec=4 les/c 93486/93486 94460/94461/92233)
>> >>> >> [119,69]/[119,69,7,142] r=2 lpr=94461 pi=92546-94460/6 lcod
>> >>> >> 94459'207003 remapped NOTIFY] state: transitioning to Stray
>> >>> >> 2014-02-04 09:49:01.124510 7fa618afe700  1 osd.7 pg_epoch: 94495
>> >>> >> pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=94462
>> >>> >> n=6718 ec=4 les/c 94462/94494 94460/94495/92233) [119,69]/[119,69,7]
>> >>> >> r=2 lpr=94495 pi=92546-94494/7 lcod 94459'207003 remapped]
>> >>> >> state: transitioning to Stray
>> >>> >>
>> >>> >> On osd.119:
>> >>> >> 2014-02-04 09:45:54.981707 7f37f07c5700  1 osd.119 pg_epoch: 94460
>> >>> >> pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
>> >>> >> n=6718 ec=4 les/c 93486/93486 94460/94460/92233) [119,69] r=0
>> >>> >> lpr=94460 pi=93485-94459/1 mlcod 0'0 inactive] state:
>> >>> >> transitioning to Primary
>> >>> >> 2014-02-04 09:45:55.805712 7f37ecfbe700  1 osd.119 pg_epoch: 94461
>> >>> >> pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
>> >>> >> n=6718 ec=4 les/c 93486/93486 94460/94461/92233)
>> >>> >> [119,69]/[119,69,

ceph branch status

2014-03-24 Thread ceph branch robot
-- All Branches --

Alfredo Deza 
2013-09-27 10:33:52 -0400   wip-5900
2014-01-21 16:29:22 -0500   wip-6465

Dan Mick 
2013-07-16 23:00:06 -0700   wip-5634
2014-03-18 21:56:03 -0700   wip-7577

Danny Al-Gaaf 
2014-03-06 01:37:05 +0100   wip-da-coverity-20140306
2014-03-15 00:21:44 +0100   wip-da-SCA-fixes-20140314
2014-03-19 14:15:45 +0100   wip-6951-dumpling

David Zafman 
2014-03-21 15:13:41 -0700   wip-dz-watch-test
2014-03-21 19:12:11 -0700   wip-7659

Gary Lowell 
2013-02-05 19:29:11 -0800   wip.cppchecker
2013-03-01 18:55:35 -0800   wip-da-spec-1
2013-07-23 17:00:07 -0700   wip-build-test
2013-09-27 11:46:26 -0700   wip-6020
2013-10-10 12:54:51 -0700   wip-5900-gl

Greg Farnum 
2013-02-13 14:46:38 -0800   wip-mds-snap-fix
2013-02-22 19:57:53 -0800   wip-4248-snapid-journaling
2013-09-19 18:15:00 -0700   wip-4354
2013-10-09 13:31:38 -0700   cuttlefish-4832
2013-11-15 14:41:51 -0800   wip-librados-command2
2013-12-09 16:21:41 -0800   wip-hitset-snapshots
2014-01-29 08:44:01 -0800   wip-filestore-fast-lookup
2014-01-29 15:45:02 -0800   wip-striper-warning
2014-02-19 11:32:08 -0800   wip-messenger-locking
2014-02-28 11:12:51 -0800   wip-7542-2
2014-03-13 16:02:17 -0700   wip-4354-mds-optracker
2014-03-13 16:02:17 -0700   wip-4354-shared_ptr
2014-03-14 17:40:30 -0700   wip-fast-dispatch

James Page 
2013-02-27 22:50:38 +   wip-debhelper-8

Joao Eduardo Luis 
2013-04-18 00:01:24 +0100   wip-4521-tool
2013-04-22 15:14:28 +0100   wip-4748
2013-04-24 16:42:11 +0100   wip-4521
2013-04-30 18:45:22 +0100   wip-mon-compact-dbg
2013-05-21 01:46:13 +0100   wip-monstoretool-foo
2013-05-31 16:26:02 +0100   wip-mon-cache-first-last-committed
2013-05-31 21:00:28 +0100   wip-mon-trim-b
2013-07-20 04:30:59 +0100   wip-mon-caps-test
2013-07-23 16:21:46 +0100   wip-5704-cuttlefish
2013-07-23 17:35:59 +0100   wip-5704
2013-08-02 22:54:42 +0100   wip-5648
2013-08-12 11:21:29 -0700   wip-store-tool.cuttlefish
2013-09-25 22:08:24 +0100   wip-6378
2013-10-10 14:06:59 +0100   wip-mon-set-pspool
2013-12-09 16:39:19 +   wip-mon-mdsmap-trim.dumpling
2013-12-18 22:17:09 +   wip-monstoretool-genmdsmaps
2014-01-17 17:11:59 -0800   wip-fix-pipe-comment-for-fhaas
2014-02-02 14:10:39 +   wip-7277.for-loic
2014-03-24 14:43:21 +   wip-status-function-names

John Spray 
2014-03-03 13:10:05 +   wip-mds-stop-rank-0

John Spray 
2014-02-20 14:41:23 +   wip-5382
2014-03-01 17:05:11 +   wip-7572
2014-03-06 13:01:25 +   wip-mds-debug
2014-03-13 18:33:01 +   wip-3863
2014-03-17 15:43:13 +   wip-journal-tool

John Wilkins 
2013-07-31 18:00:50 -0700   wip-doc-rados-python-api

Josh Durgin 
2013-03-01 14:45:23 -0800   wip-rbd-workunit-debug
2013-07-25 18:44:10 -0700   wip-5488-2
2013-08-14 15:51:04 -0700   wip-5970
2013-08-27 12:03:08 -0700   wip-krbd-workunits
2013-11-22 15:17:08 -0800   wip-zero-copy-bufferlist
2013-11-25 13:59:29 -0800   wip-init-highlander
2013-12-17 08:16:59 -0800   wip-rbd-deadlock-lockdep
2013-12-18 12:28:39 -0800   wip-rbd-deadlock-lockdep-dumpling
2013-12-26 18:06:39 -0800   emperor-5426
2013-12-26 18:07:13 -0800   dumpling-5426
2014-01-16 21:19:47 -0800   wip-objectcacher-flusher-dumpling
2014-02-06 20:31:43 -0800   wip-librados-obj-ops
2014-02-06 20:31:47 -0800   wip-librados-op-rvals
2014-03-03 14:27:39 -0800   wip-object-cacher-memory

Ken Dreyer 
2014-02-19 22:54:44 +   last

Loic Dachary 
2014-03-19 00:28:17 +0100   wip-jerasure
2014-03-24 15:32:10 +0100   wip-sse-fix

Matt Benjamin 
2013-10-08 16:49:23 -0400   wip-libcephfs-emp-rb

Noah Watkins 
2013-01-05 11:58:38 -0800   wip-localized-read-tests
2013-10-18 15:42:50 -0700   cls-lua
2013-11-05 07:30:19 -0800   port/old
2013-11-06 08:39:57 -0800   wip-6636
2013-11-26 08:26:24 -0800   wip-boost-uuid
2013-12-30 09:47:40 -0800   port/visibility
2014-01-13 12:16:22 -0800   fix-sbin-install
2014-02-05 14:23:59 -0800   port/main

Roald van Loon 
2012-12-24 22:26:56 +   wip-dout

Sage Weil 
2012-11-30 13:47:27 -0800   wip-osd-readhole
2012-12-07 14:38:46 -0800   wip-osd-alloc
2013-01-29 13:46:02 -0800   wip-readdir
2013-02-11 07:05:15 -0800   wip-sim-journal-clone
2013-04-18 13:51:36 -0700   argonaut
2013-06-02 21:21:09 -0700   wip-fuse-bobtail
2013-06-18 17:00:00 -0700   wip-mon-refs
2013-06-28 12:54:08 -0700   wip-mds-snap
 

Re: [librbd] Add interface of get the snapshot size?

2014-03-24 Thread Andrey Korolyov
On 03/24/2014 05:30 PM, Haomai Wang wrote:
> Hi all,
> 
> As we know, snapshot is a lightweight resource in librbd and we
> doesn't have any statistic informations about it. But it causes some
> problems to the cloud management.
> 
> We can't measure the size of snapshot, different snapshot will occur
> different space. So we don't have way to estimate the resource usage
> of user.
> 
> Maybe we can have a counter to record space usage when volumn created.
> When creating snapshot, the counter is freeze and store as the size of
> snapshot. New counter will assign to zero for the volume.
> 
> Any feedback is appreciate!
> 

I believe that there is a rough estimation over 'rados df'. Per-image
statistics would be awesome, though precise stats will be neither rough
too(# of rbd object clones per volume) or introduce new counter
mechanism. Dealing with discard for the filestore, it looks even more
difficult to calculate right estimation, as with XFS preallocation feature.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [librbd] Add interface of get the snapshot size?

2014-03-24 Thread Wido den Hollander

On 03/24/2014 02:30 PM, Haomai Wang wrote:

Hi all,

As we know, snapshot is a lightweight resource in librbd and we
doesn't have any statistic informations about it. But it causes some
problems to the cloud management.

We can't measure the size of snapshot, different snapshot will occur
different space. So we don't have way to estimate the resource usage
of user.

Maybe we can have a counter to record space usage when volumn created.


What do you mean with space usage? Cluster wide or pool usage?


When creating snapshot, the counter is freeze and store as the size of
snapshot. New counter will assign to zero for the volume.

Any feedback is appreciate!




--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[librbd] Add interface of get the snapshot size?

2014-03-24 Thread Haomai Wang
Hi all,

As we know, snapshot is a lightweight resource in librbd and we
doesn't have any statistic informations about it. But it causes some
problems to the cloud management.

We can't measure the size of snapshot, different snapshot will occur
different space. So we don't have way to estimate the resource usage
of user.

Maybe we can have a counter to record space usage when volumn created.
When creating snapshot, the counter is freeze and store as the size of
snapshot. New counter will assign to zero for the volume.

Any feedback is appreciate!

-- 

Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ceph: don't include ceph.{file,dir}.layout vxattr in listxattr()

2014-03-24 Thread Yan, Zheng
This avoids 'cp -a' modifying layout of new files/directories.

Signed-off-by: Yan, Zheng 
---
 fs/ceph/xattr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
index 28549d5..c9c2b88 100644
--- a/fs/ceph/xattr.c
+++ b/fs/ceph/xattr.c
@@ -231,7 +231,7 @@ static struct ceph_vxattr ceph_dir_vxattrs[] = {
.name_size = sizeof("ceph.dir.layout"),
.getxattr_cb = ceph_vxattrcb_layout,
.readonly = false,
-   .hidden = false,
+   .hidden = true,
.exists_cb = ceph_vxattrcb_layout_exists,
},
XATTR_LAYOUT_FIELD(dir, layout, stripe_unit),
@@ -258,7 +258,7 @@ static struct ceph_vxattr ceph_file_vxattrs[] = {
.name_size = sizeof("ceph.file.layout"),
.getxattr_cb = ceph_vxattrcb_layout,
.readonly = false,
-   .hidden = false,
+   .hidden = true,
.exists_cb = ceph_vxattrcb_layout_exists,
},
XATTR_LAYOUT_FIELD(file, layout, stripe_unit),
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Limiting specific to specific directory, client separation

2014-03-24 Thread Szymon Szypulski
Hi,

We would liked to migrate from gluster to ceph in environment where
multiple servers can access one directory, where some should be
unavailable. Basically we need per directory access list for multiple
users.

I've found some overdue tickets on tracker
http://tracker.ceph.com/issues/1401 and
http://tracker.ceph.com/issues/1237. Is there any chance it will be
included into road map soon? If not, is there anything we can do to
speed it up? Hire C/C++ dev internally and delegate it to ceph
development? Maybe something else?

---
Szymon Szypulski
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html