Re: [PATCH 7/8] rbd: do not read in parent info before snap context

2014-07-25 Thread Alex Elder
On 07/24/2014 03:42 AM, Ilya Dryomov wrote:
 Currently rbd_dev_v2_header_info() reads in parent info before the snap
 context is read in.  This is wrong, because we may need to look at the
 the parent_overlap value of the snapshot instead of that of the base

I had some trouble understanding this.

The parent image, snapshot id and overlap, are together a property
of a particular image (and associated with its header object).
Nothing about that is dependent on the child image snapshots
or snapshot id.

However...

 image, for example when mapping a snapshot - see next commit.  (When
 mapping a snapshot, all we got is its name and we need the snap context
 to translate that name into an id to know which parent info to look
 for).

I finally figured out what path through the code you
were talking about.  Here's what I see.

On the initial probe (previously), we have:
  rbd_add()
do_rbd_add()
  rbd_dev_image_probe()
   |rbd_dev_header_info()
   | |rbd_dev_v2_header_info()
   | | |rbd_dev_v2_parent_info()
   | | | -- expects rbd_dev-spec-snap_id to be valid
   | | |rbd_dev_v2_snap_context()
   | | | -- this fills in the snapshot context
   |rbd_spec_fill_snap_id()
   | | -- fills rbd_dev-spec-snap_id
   |rbd_dev_probe_parent()

So clearly, at least when mapping a snapshot, we would not
get the desired result.  We'll be unable to look up the id
for the named snapshot, so would get ENOENT.

Now you've pulled getting the parent info back out
into rbd_dev_image_probe():
  rbd_add()
do_rbd_add()
  rbd_dev_image_probe()
   |rbd_dev_header_info()
   | |rbd_dev_v2_header_info()
   | | |rbd_dev_v2_snap_context()
   | | | -- this fills in the snapshot context
   |rbd_spec_fill_snap_id()
   | | -- fills rbd_dev-spec-snap_id
   |rbd_dev_v2_parent_info()
   | | -- rbd_dev-spec-snap_id will be valid
   |rbd_dev_probe_parent()

In the refresh path it's similar.  You move the
rbd_dev_v2_parent_info() call into rbd_dev_refresh()
instead of it happening in rbd_dev_v2_header_info().
Missing the ordering problem here might have caused
even more subtle problems (due to using an apparently
valid but possibly out-of-date snapshot context).

Given this understanding I'd say your change looks
good.

 The approach taken here is to make sure rbd_dev_v2_parent_info() is
 called after the snap context has been read in.  The other approach
 would be to add a parent_overlap field to struct rbd_mapping and
 maintain it the same way rbd_mapping::size is maintained.  The reason
 I chose the first approach is that the value of keeping around both
 base image values and the actual mapping values is unclear to me.

I'll think about this and respond to your followup e-mail.

Reviewed-by: Alex Elder el...@linaro.org

 Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
 ---
  drivers/block/rbd.c |   64 
 ---
  1 file changed, 30 insertions(+), 34 deletions(-)
 
 diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
 index 16f388f2960b..c4606987e9d1 100644
 --- a/drivers/block/rbd.c
 +++ b/drivers/block/rbd.c
 @@ -515,6 +515,7 @@ static void rbd_dev_remove_parent(struct rbd_device 
 *rbd_dev);
  static int rbd_dev_refresh(struct rbd_device *rbd_dev);
  static int rbd_dev_v2_header_onetime(struct rbd_device *rbd_dev);
  static int rbd_dev_header_info(struct rbd_device *rbd_dev);
 +static int rbd_dev_v2_parent_info(struct rbd_device *rbd_dev);
  static const char *rbd_dev_v2_snap_name(struct rbd_device *rbd_dev,
   u64 snap_id);
  static int _rbd_dev_v2_snap_size(struct rbd_device *rbd_dev, u64 snap_id,
 @@ -3513,6 +3514,18 @@ static int rbd_dev_refresh(struct rbd_device *rbd_dev)
   mapping_size = rbd_dev-mapping.size;
  
   ret = rbd_dev_header_info(rbd_dev);

This looked odd.  But I guess I didn't notice you
didn't check the return value when you first added
this line a few patches ago...

 + if (ret)
 + return ret;
 +
 + /*
 +  * If there is a parent, see if it has disappeared due to the
 +  * mapped image getting flattened.
 +  */
 + if (rbd_dev-parent) {
 + ret = rbd_dev_v2_parent_info(rbd_dev);
 + if (ret)
 + return ret;
 + }
  
   if (rbd_dev-spec-snap_id == CEPH_NOSNAP) {
   if (rbd_dev-mapping.size != rbd_dev-header.image_size)
 @@ -3524,11 +3537,10 @@ static int rbd_dev_refresh(struct rbd_device *rbd_dev)
  
   up_write(rbd_dev-header_rwsem);
  
 - if (mapping_size != rbd_dev-mapping.size) {
 + if (mapping_size != rbd_dev-mapping.size)
   rbd_dev_update_size(rbd_dev);
 - }
  
 - return ret;
 + return 0;
  }
  
  static int rbd_init_disk(struct rbd_device *rbd_dev)
 @@ -4477,33 +4489,6 @@ static int rbd_dev_v2_header_info(struct rbd_device 
 *rbd_dev)
   return ret;
   }
  
 - /*
 -  * If the 

Re: [PATCH 7/8] rbd: do not read in parent info before snap context

2014-07-25 Thread Ilya Dryomov
On Fri, Jul 25, 2014 at 12:14 PM, Alex Elder el...@ieee.org wrote:
 On 07/24/2014 03:42 AM, Ilya Dryomov wrote:
 Currently rbd_dev_v2_header_info() reads in parent info before the snap
 context is read in.  This is wrong, because we may need to look at the
 the parent_overlap value of the snapshot instead of that of the base

 I had some trouble understanding this.

 The parent image, snapshot id and overlap, are together a property
 of a particular image (and associated with its header object).
 Nothing about that is dependent on the child image snapshots
 or snapshot id.

 However...

 image, for example when mapping a snapshot - see next commit.  (When
 mapping a snapshot, all we got is its name and we need the snap context
 to translate that name into an id to know which parent info to look
 for).

 I finally figured out what path through the code you
 were talking about.  Here's what I see.

 On the initial probe (previously), we have:
   rbd_add()
 do_rbd_add()
   rbd_dev_image_probe()
|rbd_dev_header_info()
| |rbd_dev_v2_header_info()
| | |rbd_dev_v2_parent_info()
| | | -- expects rbd_dev-spec-snap_id to be valid
| | |rbd_dev_v2_snap_context()
| | | -- this fills in the snapshot context
|rbd_spec_fill_snap_id()
| | -- fills rbd_dev-spec-snap_id
|rbd_dev_probe_parent()

 So clearly, at least when mapping a snapshot, we would not
 get the desired result.  We'll be unable to look up the id
 for the named snapshot, so would get ENOENT.

 Now you've pulled getting the parent info back out
 into rbd_dev_image_probe():
   rbd_add()
 do_rbd_add()
   rbd_dev_image_probe()
|rbd_dev_header_info()
| |rbd_dev_v2_header_info()
| | |rbd_dev_v2_snap_context()
| | | -- this fills in the snapshot context
|rbd_spec_fill_snap_id()
| | -- fills rbd_dev-spec-snap_id
|rbd_dev_v2_parent_info()
| | -- rbd_dev-spec-snap_id will be valid
|rbd_dev_probe_parent()

 In the refresh path it's similar.  You move the
 rbd_dev_v2_parent_info() call into rbd_dev_refresh()
 instead of it happening in rbd_dev_v2_header_info().
 Missing the ordering problem here might have caused
 even more subtle problems (due to using an apparently
 valid but possibly out-of-date snapshot context).

 Given this understanding I'd say your change looks
 good.

Yeah, basically any time we read in snapshot metadata (both when
mapping a snapshot or following a chain of parent images) we need to
look at the parent_overlap of the snapshot (i.e. the parent_overlap of
the base image at the time the snapshot was taken).  That requires
having snap_id at hand when the call to rbd_dev_v2_parent_info() is
made.


 The approach taken here is to make sure rbd_dev_v2_parent_info() is
 called after the snap context has been read in.  The other approach
 would be to add a parent_overlap field to struct rbd_mapping and
 maintain it the same way rbd_mapping::size is maintained.  The reason
 I chose the first approach is that the value of keeping around both
 base image values and the actual mapping values is unclear to me.

 I'll think about this and respond to your followup e-mail.

 Reviewed-by: Alex Elder el...@linaro.org

 Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
 ---
  drivers/block/rbd.c |   64 
 ---
  1 file changed, 30 insertions(+), 34 deletions(-)

 diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
 index 16f388f2960b..c4606987e9d1 100644
 --- a/drivers/block/rbd.c
 +++ b/drivers/block/rbd.c
 @@ -515,6 +515,7 @@ static void rbd_dev_remove_parent(struct rbd_device 
 *rbd_dev);
  static int rbd_dev_refresh(struct rbd_device *rbd_dev);
  static int rbd_dev_v2_header_onetime(struct rbd_device *rbd_dev);
  static int rbd_dev_header_info(struct rbd_device *rbd_dev);
 +static int rbd_dev_v2_parent_info(struct rbd_device *rbd_dev);
  static const char *rbd_dev_v2_snap_name(struct rbd_device *rbd_dev,
   u64 snap_id);
  static int _rbd_dev_v2_snap_size(struct rbd_device *rbd_dev, u64 snap_id,
 @@ -3513,6 +3514,18 @@ static int rbd_dev_refresh(struct rbd_device *rbd_dev)
   mapping_size = rbd_dev-mapping.size;

   ret = rbd_dev_header_info(rbd_dev);

 This looked odd.  But I guess I didn't notice you
 didn't check the return value when you first added
 this line a few patches ago...

This is what rbd_dev_refresh() did before any of my changes.  It would
go ahead with trying to update mapping size and validating snapshot
existance even if rbd_dev_v{1,2}_header_info() had failed.


 + if (ret)
 + return ret;

I'll move this bit to harden rbd_dev_refresh() commit.

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ceph: remove redundant memset(0)

2014-07-25 Thread Ilya Dryomov
xattrs array of pointers is allocated with kcalloc() - no need to
memset() it to 0 right after that.

Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
 fs/ceph/xattr.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
index c9c2b887381e..f89698cdbc41 100644
--- a/fs/ceph/xattr.c
+++ b/fs/ceph/xattr.c
@@ -597,7 +597,7 @@ start:
err = -ENOMEM;
if (!xattrs)
goto bad_lock;
-   memset(xattrs, 0, numattr*sizeof(struct ceph_xattr *));
+
for (i = 0; i  numattr; i++) {
xattrs[i] = kmalloc(sizeof(struct ceph_inode_xattr),
GFP_NOFS);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ceph: fix sizeof(struct tYpO *) typo

2014-07-25 Thread Ilya Dryomov
struct ceph_xattr - struct ceph_inode_xattr

Reported-by: Toralf Förster toralf.foers...@gmx.de
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
 fs/ceph/xattr.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
index f89698cdbc41..12f58d22e017 100644
--- a/fs/ceph/xattr.c
+++ b/fs/ceph/xattr.c
@@ -592,7 +592,7 @@ start:
xattr_version = ci-i_xattrs.version;
spin_unlock(ci-i_ceph_lock);
 
-   xattrs = kcalloc(numattr, sizeof(struct ceph_xattr *),
+   xattrs = kcalloc(numattr, sizeof(struct ceph_inode_xattr *),
 GFP_NOFS);
err = -ENOMEM;
if (!xattrs)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sizeof (struct tYpO *) : it is just a typo or rather a bug ?

2014-07-25 Thread Henrique de Moraes Holschuh
On Thu, 24 Jul 2014, Toralf Förster wrote:
 Inspired by this typo fix 
 http://article.gmane.org/gmane.linux.kernel/1754640
 I grep'ed the current git tree of linus for similar issues.

I wonder if we couldn't use Coccinelle to do that?  I would say it would be
not as cool as deep grep magick, but Coccinelle is cool by definition and
therefore immune from any such comparisons :-)

 Nevertheless I was told, that gcc doesn't complain about such things due
 to eventually evaluating it to sizeof(null). I'm however curious if at
 least a warning should be emitted in such a case, or?

Well, it cannot become a real bug because the moment the code changes to
actually access/derreference such a typo, it will cause the compiler to
abort with an error.  If gcc will have to waste a measurable amount of time
to issue such a warning, it is not worth it.

OTOH, such typos could confuse someone reading the code into thinking
they're dealing with a different structure or something, and it _is_
incorrect code no matter how harmless, so it makes sense to fix all such
typos eventually.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Forcing Ceph into mapping all objects to a single PG

2014-07-25 Thread Daniel Hofmann
The main issue however is not the hash's strength, but the fact that
once pre-computed, I'm able to use preimages on **every Ceph cluster out
there**. (As the hash functions's output is a deterministic function of
the object's name only)

I agree in that the general issue is inherent in hash-placement systems.

But what I don't agree with is the following:

Why do I have to be able to calculate my object's placement for **every
Ceph cluster** out there?

Why does it not suffice for me to be able to calculate the placement
only for the cluster I'm currently accessing?

From a logical standpoint it seems reasonable. Why then, are we not able
to constrain the placement calculation in that regard?


If the placement is bound to a specific cluster it should suffice to
derive e.g. a key for SipHash based on cluster specifics.

Is this doable from an implementation point of view?


Note: I only did this as a proof-of-concept for the object store.
Think about the implications, if you're able to do this e.g. for every
RadosGW out there and servies using RadosGW.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] rbd: do not read in parent info before snap context

2014-07-25 Thread Alex Elder
On 07/25/2014 03:36 AM, Ilya Dryomov wrote:
 
  + if (ret)
  + return ret;
 I'll move this bit to harden rbd_dev_refresh() commit.

Sounds good.-Alex
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ceph: fix sizeof(struct tYpO *) typo

2014-07-25 Thread Alex Elder
On 07/25/2014 04:26 AM, Ilya Dryomov wrote:
 struct ceph_xattr - struct ceph_inode_xattr

Looks good.  I can't find the definition of ceph_xattr.

Reviewed-by: Alex Elder el...@linaro.org

 Reported-by: Toralf Förster toralf.foers...@gmx.de
 Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
 ---
  fs/ceph/xattr.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
 index f89698cdbc41..12f58d22e017 100644
 --- a/fs/ceph/xattr.c
 +++ b/fs/ceph/xattr.c
 @@ -592,7 +592,7 @@ start:
   xattr_version = ci-i_xattrs.version;
   spin_unlock(ci-i_ceph_lock);
  
 - xattrs = kcalloc(numattr, sizeof(struct ceph_xattr *),
 + xattrs = kcalloc(numattr, sizeof(struct ceph_inode_xattr *),
GFP_NOFS);
   err = -ENOMEM;
   if (!xattrs)
 

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ceph: fix sizeof(struct tYpO *) typo

2014-07-25 Thread Ilya Dryomov
On Fri, Jul 25, 2014 at 4:52 PM, Alex Elder el...@ieee.org wrote:
 On 07/25/2014 04:26 AM, Ilya Dryomov wrote:
 struct ceph_xattr - struct ceph_inode_xattr

 Looks good.  I can't find the definition of ceph_xattr.

That's the point of the patch ;)

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/8] rbd: update mapping size only on refresh

2014-07-25 Thread Alex Elder
On 07/24/2014 10:10 AM, Ilya Dryomov wrote:
 On Thu, Jul 24, 2014 at 5:46 PM, Ilya Dryomov ilya.dryo...@inktank.com 
 wrote:
 On Thu, Jul 24, 2014 at 5:25 PM, Alex Elder el...@ieee.org wrote:
 On 07/24/2014 03:42 AM, Ilya Dryomov wrote:
 There is no sense in trying to update the mapping size before it's even
 been set.

 It took me a bit to follow this.  But basically there is
 no mapping unless it's mapped.  So previously this was
 updating the mapping information even for unmapped
 parent (or other) images.  There's no need to update
 the mapping size for a snapshot--it'll never change.

 Is that right?  If not, please advise; otherwise:

 No.  Previously it was updating the mapping size both on the inital map
 and on refresh (of the base image).  Doing that on the inital map
 doesn't make sense: not only struct rbd_mapping fields aren't properly
 initialized at that point - rbd_dev_mapping_set() is called much later
 in the map sequence, snap_id isn't initialized either (at least in the
 format 2 case, I haven't looked too closely at the format 1 case).  And
 just in general, trying to update something before it had a chance to
 change is bogus..
 
 BTW, while we are on the subject of struct rbd_mapping, is there any
 particular reason to keep around both the base image values and actual
 mapping values?  I am tempted to change the mapping sequence so that 1)
 snap context is read in immediately after watch setup, before anything
 else is done, 2) supplied snap name is resolved into an id and 3) the
 actual (i.e. based on snap_id) mapping size, features, parent_overlap,
 etc are read in directly into struct rbd_image_header.  That would
 allow to rip struct rbd_mapping entirely and make the code more clear.

The rbd_mapping structure started out well-intentioned but
over time it seems clear it hasn't added the value it was
intended to add.  Here's where it got created:
f84344f rbd: separate mapping info in rbd_dev
The only original field that survives is read_only.  There's
no harm at all in just moving the fields in that structure
out into the rbd_device structure.

Preserving the base image size in the header structure is
an artifact of the version 1 code, where it held the last
version of the on-disk header data.  The version 2 code
maintains it for consistency.

You could eliminate that I suppose.  I think the result
would require rbd_header_from_disk() to know about the
mapping rather than doing a simple conversion of data
from one form to another.

I say try it, and if you like the result I probably will too...

-Alex


 Thanks,
 
 Ilya
 

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: S3 API Compatibility support

2014-07-25 Thread M Ranga Swami Reddy
Hi Team: As per the ceph document a few S3 APIs compatibility not supported.

Link: http://ceph.com/docs/master/radosgw/s3/

Is there plan to support the ün supported item in the above table?
or
Any working on this?

Thanks
Swami
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Forcing Ceph into mapping all objects to a single PG

2014-07-25 Thread Sage Weil
On Fri, 25 Jul 2014, Daniel Hofmann wrote:
 The main issue however is not the hash's strength, but the fact that
 once pre-computed, I'm able to use preimages on **every Ceph cluster out
 there**. (As the hash functions's output is a deterministic function of
 the object's name only)
 
 I agree in that the general issue is inherent in hash-placement systems.
 
 But what I don't agree with is the following:
 
 Why do I have to be able to calculate my object's placement for **every
 Ceph cluster** out there?
 
 Why does it not suffice for me to be able to calculate the placement
 only for the cluster I'm currently accessing?
 
 From a logical standpoint it seems reasonable. Why then, are we not able
 to constrain the placement calculation in that regard?
 
 
 If the placement is bound to a specific cluster it should suffice to
 derive e.g. a key for SipHash based on cluster specifics.
 
 Is this doable from an implementation point of view?
 
 
 Note: I only did this as a proof-of-concept for the object store.
 Think about the implications, if you're able to do this e.g. for every
 RadosGW out there and servies using RadosGW.

It would be really easy to add a random salt to the pg_pool_t and feed 
that into the object - pg hash mapping.  Note, by the way, that for new 
clusters the pool id is already fed in here, so there is a *tiny* bit of 
variation depending on what orders the pools were created in (although 
probably not enough to meaningfully improve security).

We could also add a new hash type that is more secure.  Rjenkins is used 
by default but the choice of hash is already parameterized.

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: First attempt at rocksdb monitor store stress testing

2014-07-25 Thread Mark Nelson

On 07/24/2014 08:28 PM, Shu, Xinxin wrote:

Hi mark,

I am looking forward to your results on SSDs .


Me too!


rocksdb generates a crc of data to be written , this cannot be switch off (but 
can be subsititued with xxhash),  there are two options , Option. 
verify_checksums_in_compaction and ReadOptions. verify_checksums,  If we 
disable these two options , i think cpu usage will goes down . If we use 
universal compaction , this is not friendly with read operation.


I'm wondering if it might not be so bad for us given the kind of work 
the mon does.  We write out a lot of maps and incrementals, but I don't 
think the mon goes back and updates objects very often.  Assuming I 
understand how universal vs level compaction works (I might not!) this 
should help contain the number of SST files that objects get spread 
across which causes all of the extra read seeks with universal compaction.




Btw , can you list your rocksdb configuration?


Sure, right now it's all the stock defaults in config_opts except the 
new option I added for universal compaction.  I am hoping I can run some 
more tests today and this weekend with tuned ones.




Cheers,
xinxin

-Original Message-
From: Mark Nelson [mailto:mark.nel...@inktank.com]
Sent: Friday, July 25, 2014 7:45 AM
To: Shu, Xinxin; ceph-devel@vger.kernel.org
Subject: Re: First attempt at rocksdb monitor store stress testing

Earlier today I modified the rocksdb options so I could enable universal 
compaction.  Over all performance is lower but I don't see the hang/stall in 
the middle of the test either.  Instead the disk is basically pegged with 100% 
writes.  I suspect average latency is higher than leveldb, but the highest 
latency is about 5-6s while we were seeing 30s spikes for leveldb with levelled 
(heh) compaction.

I haven't done much tuning either way yet.  It may be that if we keep level 0 
and level 1 roughly the same size we can reduce stalls in the levelled setups.  
It will also be interesting to see what happens in these tests on SSDs.

Mark

On 07/24/2014 06:13 AM, Mark Nelson wrote:

Hi Xinxin,

Thanks! I wonder as well if it might be interesting to expose the
options related to universal compaction?  It looks like rocksdb
provides a lot of interesting knobs you can adjust!

Mark

On 07/24/2014 12:08 AM, Shu, Xinxin wrote:

Hi mark,

I think this maybe related to 'verify_checksums' config option ,when
ReadOptions is initialized, default this option is  true , all data
read from underlying storage will be verified against corresponding
checksums,  however,  this option cannot be configured in wip-rocksdb
branch. I will modify code to make this option configurable .

Cheers,
xinxin

-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Mark Nelson
Sent: Thursday, July 24, 2014 7:14 AM
To: ceph-devel@vger.kernel.org
Subject: First attempt at rocksdb monitor store stress testing

Hi Guys,

So I've been interested lately in leveldb 99th percentile latency
(and the amount of write amplification we are seeing) with leveldb.
Joao mentioned he has written a tool called mon-store-stress in
wip-leveldb-misc to try to provide a means to roughly guess at what's
happening on the mons under heavy load.  I cherry-picked it over to
wip-rocksdb and after a couple of hacks was able to get everything
built and running with some basic tests.  There was little tuning
done and I don't know how realistic this workload is, so don't assume
this means anything yet, but some initial results are here:

http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf

Command that was used to run the tests:

./ceph-test-mon-store-stress --mon-keyvaluedb leveldb|rocksdb
--write-min-size 50K --write-max-size 2M --percent-write 70
--percent-read 30 --keep-state --test-seed 1406137270 --stop-at 5000
foo

The most interesting bit right now is that rocksdb seems to be
hanging in the middle of the test (left it running for several
hours).  CPU usage on one core was quite high during the hang.
Profiling using perf with dwarf symbols I see:

-  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.] unsigned
int
rocksdb::crc32c::ExtendImplrocksdb::crc32c::Fast_CRC32(unsigned
int, char const*, unsigned long)
  - unsigned int
rocksdb::crc32c::ExtendImplrocksdb::crc32c::Fast_CRC32(unsigned
int, char const*, unsigned long)
   51.70%
rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
rocksdb::Footer const, rocksdb::ReadOptions const,
rocksdb::BlockHandle const, rocksdb::BlockContents*, rocksdb::Env*,
bool)
   48.30%
rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice const,
rocksdb::CompressionType, rocksdb::BlockHandle*)

Not sure what's going on yet, may need to try to enable
logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)

Mark
--
To unsubscribe from this list: send the line unsubscribe ceph-devel
in the body of a message to majord...@vger.kernel.org More majordomo

Re: Fwd: S3 API Compatibility support

2014-07-25 Thread Sage Weil
On Fri, 25 Jul 2014, M Ranga Swami Reddy wrote:
 Hi Team: As per the ceph document a few S3 APIs compatibility not supported.
 
 Link: http://ceph.com/docs/master/radosgw/s3/
 
 Is there plan to support the ?n supported item in the above table?
 or
 Any working on this?

Yes.  Unfortunately this table isn't particularly detailed or accurate or 
up to date.   The main gap, I think, is versioned objects.

Are there specfiic parts of the S3 API that are missing that you need?  
That sort of info is very helpful for prioritizing effort...

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ceph: fix sizeof(struct tYpO *) typo

2014-07-25 Thread Alex Elder
On 07/25/2014 08:27 AM, Ilya Dryomov wrote:
 On Fri, Jul 25, 2014 at 4:52 PM, Alex Elder el...@ieee.org wrote:
 On 07/25/2014 04:26 AM, Ilya Dryomov wrote:
 struct ceph_xattr - struct ceph_inode_xattr

 Looks good.  I can't find the definition of ceph_xattr.
 
 That's the point of the patch ;)

Well, I thought the point was that the type was
wrong; I was surprised that the type was non-existent.

-Alex

 Thanks,
 
 Ilya
 

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: First attempt at rocksdb monitor store stress testing

2014-07-25 Thread Mark Nelson

Hi Xinxin,

I'm trying to enable the rocksdb log file as described in config_opts using:

rocksdb_log = path to log file

The file gets created but is empty.  Any ideas?

Mark

On 07/24/2014 08:28 PM, Shu, Xinxin wrote:

Hi mark,

I am looking forward to your results on SSDs .
rocksdb generates a crc of data to be written , this cannot be switch off (but 
can be subsititued with xxhash),  there are two options , Option. 
verify_checksums_in_compaction and ReadOptions. verify_checksums,  If we 
disable these two options , i think cpu usage will goes down . If we use 
universal compaction , this is not friendly with read operation.

Btw , can you list your rocksdb configuration?

Cheers,
xinxin

-Original Message-
From: Mark Nelson [mailto:mark.nel...@inktank.com]
Sent: Friday, July 25, 2014 7:45 AM
To: Shu, Xinxin; ceph-devel@vger.kernel.org
Subject: Re: First attempt at rocksdb monitor store stress testing

Earlier today I modified the rocksdb options so I could enable universal 
compaction.  Over all performance is lower but I don't see the hang/stall in 
the middle of the test either.  Instead the disk is basically pegged with 100% 
writes.  I suspect average latency is higher than leveldb, but the highest 
latency is about 5-6s while we were seeing 30s spikes for leveldb with levelled 
(heh) compaction.

I haven't done much tuning either way yet.  It may be that if we keep level 0 
and level 1 roughly the same size we can reduce stalls in the levelled setups.  
It will also be interesting to see what happens in these tests on SSDs.

Mark

On 07/24/2014 06:13 AM, Mark Nelson wrote:

Hi Xinxin,

Thanks! I wonder as well if it might be interesting to expose the
options related to universal compaction?  It looks like rocksdb
provides a lot of interesting knobs you can adjust!

Mark

On 07/24/2014 12:08 AM, Shu, Xinxin wrote:

Hi mark,

I think this maybe related to 'verify_checksums' config option ,when
ReadOptions is initialized, default this option is  true , all data
read from underlying storage will be verified against corresponding
checksums,  however,  this option cannot be configured in wip-rocksdb
branch. I will modify code to make this option configurable .

Cheers,
xinxin

-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Mark Nelson
Sent: Thursday, July 24, 2014 7:14 AM
To: ceph-devel@vger.kernel.org
Subject: First attempt at rocksdb monitor store stress testing

Hi Guys,

So I've been interested lately in leveldb 99th percentile latency
(and the amount of write amplification we are seeing) with leveldb.
Joao mentioned he has written a tool called mon-store-stress in
wip-leveldb-misc to try to provide a means to roughly guess at what's
happening on the mons under heavy load.  I cherry-picked it over to
wip-rocksdb and after a couple of hacks was able to get everything
built and running with some basic tests.  There was little tuning
done and I don't know how realistic this workload is, so don't assume
this means anything yet, but some initial results are here:

http://nhm.ceph.com/mon-store-stress/First%20Attempt.pdf

Command that was used to run the tests:

./ceph-test-mon-store-stress --mon-keyvaluedb leveldb|rocksdb
--write-min-size 50K --write-max-size 2M --percent-write 70
--percent-read 30 --keep-state --test-seed 1406137270 --stop-at 5000
foo

The most interesting bit right now is that rocksdb seems to be
hanging in the middle of the test (left it running for several
hours).  CPU usage on one core was quite high during the hang.
Profiling using perf with dwarf symbols I see:

-  49.14%  ceph-test-mon-s  ceph-test-mon-store-stress  [.] unsigned
int
rocksdb::crc32c::ExtendImplrocksdb::crc32c::Fast_CRC32(unsigned
int, char const*, unsigned long)
  - unsigned int
rocksdb::crc32c::ExtendImplrocksdb::crc32c::Fast_CRC32(unsigned
int, char const*, unsigned long)
   51.70%
rocksdb::ReadBlockContents(rocksdb::RandomAccessFile*,
rocksdb::Footer const, rocksdb::ReadOptions const,
rocksdb::BlockHandle const, rocksdb::BlockContents*, rocksdb::Env*,
bool)
   48.30%
rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice const,
rocksdb::CompressionType, rocksdb::BlockHandle*)

Not sure what's going on yet, may need to try to enable
logging/debugging in rocksdb.  Thoughts/Suggestions welcome. :)

Mark
--
To unsubscribe from this list: send the line unsubscribe ceph-devel
in the body of a message to majord...@vger.kernel.org More majordomo
info at  http://vger.kernel.org/majordomo-info.html







--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: S3 API Compatibility support

2014-07-25 Thread M Ranga Swami Reddy
Thanks for quick reply.
Yes,  versioned object - missing in ceph ATM
Iam looking for: bucket lifecylce (get/put/delete), bucket location,
put object notification and object restore (ie versioned object) S3
API support.
Please let me now any of the above work is in progress or some one
planned to work on.

Thanks
Swami


On Fri, Jul 25, 2014 at 9:19 PM, Sage Weil sw...@redhat.com wrote:
 On Fri, 25 Jul 2014, M Ranga Swami Reddy wrote:
 Hi Team: As per the ceph document a few S3 APIs compatibility not supported.

 Link: http://ceph.com/docs/master/radosgw/s3/

 Is there plan to support the ?n supported item in the above table?
 or
 Any working on this?

 Yes.  Unfortunately this table isn't particularly detailed or accurate or
 up to date.   The main gap, I think, is versioned objects.

 Are there specfiic parts of the S3 API that are missing that you need?
 That sort of info is very helpful for prioritizing effort...

 sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: S3 API Compatibility support

2014-07-25 Thread Yehuda Sadeh
On Fri, Jul 25, 2014 at 10:14 AM, M Ranga Swami Reddy
swamire...@gmail.com wrote:
 Thanks for quick reply.
 Yes,  versioned object - missing in ceph ATM
 Iam looking for: bucket lifecylce (get/put/delete), bucket location,
 put object notification and object restore (ie versioned object) S3
 API support.
 Please let me now any of the above work is in progress or some one
 planned to work on.


I opened an issue for bucket lifecycle (we already had an issue open
for object expiration though). We do have bucket location already
(part of the multi-region feature). Object versioning is definitely on
our backlog and one that we'll hopefully implement sooner rather
later.
With regard to object notification, it'll require having a
notification service which is a bit out of the scope. Integrating the
gateway with such a service whouldn't be hard, but we'll need to have
that first.

Yehuda


 Thanks
 Swami


 On Fri, Jul 25, 2014 at 9:19 PM, Sage Weil sw...@redhat.com wrote:
 On Fri, 25 Jul 2014, M Ranga Swami Reddy wrote:
 Hi Team: As per the ceph document a few S3 APIs compatibility not supported.

 Link: http://ceph.com/docs/master/radosgw/s3/

 Is there plan to support the ?n supported item in the above table?
 or
 Any working on this?

 Yes.  Unfortunately this table isn't particularly detailed or accurate or
 up to date.   The main gap, I think, is versioned objects.

 Are there specfiic parts of the S3 API that are missing that you need?
 That sort of info is very helpful for prioritizing effort...

 sage
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Forcing Ceph into mapping all objects to a single PG

2014-07-25 Thread Alex Elsayed
Sage Weil wrote:

 On Fri, 25 Jul 2014, Daniel Hofmann wrote:
 The main issue however is not the hash's strength, but the fact that
 once pre-computed, I'm able to use preimages on **every Ceph cluster out
 there**. (As the hash functions's output is a deterministic function of
 the object's name only)
 
 I agree in that the general issue is inherent in hash-placement systems.
 
 But what I don't agree with is the following:
 
 Why do I have to be able to calculate my object's placement for **every
 Ceph cluster** out there?
 
 Why does it not suffice for me to be able to calculate the placement
 only for the cluster I'm currently accessing?
 
 From a logical standpoint it seems reasonable. Why then, are we not able
 to constrain the placement calculation in that regard?
 
 
 If the placement is bound to a specific cluster it should suffice to
 derive e.g. a key for SipHash based on cluster specifics.
 
 Is this doable from an implementation point of view?
 
 
 Note: I only did this as a proof-of-concept for the object store.
 Think about the implications, if you're able to do this e.g. for every
 RadosGW out there and servies using RadosGW.
 
 It would be really easy to add a random salt to the pg_pool_t and feed
 that into the object - pg hash mapping.  Note, by the way, that for new
 clusters the pool id is already fed in here, so there is a *tiny* bit of
 variation depending on what orders the pools were created in (although
 probably not enough to meaningfully improve security).

I just had a thought - is the FSID a parameter? If it is, then there's 
actually no need for a further salt: FSID is per cluster, preventing cross-
cluster precomputation, and if the pool ID is fed in then each pool needs 
collisions calculated separately as well. In other words, the (fsid, poolid) 
tuple is almost certainly sufficient salt in and of itself.

 We could also add a new hash type that is more secure.  Rjenkins is used
 by default but the choice of hash is already parameterized.
 
 sage



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


LTTng unfriendly with mixed static/dynamic linking

2014-07-25 Thread Adam Crume
LTTng requires tracepoints to be linked into a program only once.  If
tracepoints are linked in multiple times, the program crashes at
startup with: LTTng-UST: Error (-17) while registering tracepoint
probe. Duplicate registration of tracepoint probes having the same
name is not allowed.

This is problematic when mixing static and dynamic linking.  If the
tracepoints are in a static library, that library can end up in an
executable multiple times by being linked in directly, as well as
being statically linked into a dynamic library.  Even if the
tracepoints are not linked directly into the executable, they can be
statically linked into multiple dynamic libraries that the executable
loads.

For us, this problem shows up with libcommon, and could show up with
others such as libosd.  (In general, I think anything added to
noinst_LTLIBRARIES is static, and anything added to lib_LTLIBRARIES is
dynamic.)

There are a few ways of solving the issue:
1. Change every library that has tracepoints, like libcommon, from
static to dynamic.  This could be a big change, as at the very least
we'd have to rename the library to something like libceph_common to
avoid conflicts, since now it would be an installed file.  This has
the advantage of keeping code and its tracepoints in the same library.
2. Keep tracepoints in a separate static library.  For example,
libcommon and libcommon_tp.  Unfortunately, every executable (but not
library!) that links in libcommon (directly or indirectly) would have
to manually link in libcommon_tp, and I don't think Automake could
automate that.
3. Keep tracepoints in a separate dynamic library.  In this case, I
think libcommon could depend on libcommon_tp, so executables would not
have to manually link in libcommon_tp.  (I'm not an Automake expert,
so let me know if I'm wrong on that.)  Again, libcommon_tp would be an
installed file, so we'd want to rename it to something like
libceph_common_tp.

I attached a minimal test case of the problem.

Thoughts or suggestions?

Thanks,
Adam Crume


lttng-duplicate-tracepoints.tar.gz
Description: GNU Zip compressed data


Re: LTTng unfriendly with mixed static/dynamic linking

2014-07-25 Thread Adam Crume
I tried all solutions, and it looks like only #1 works.  #2 gives the
error /usr/bin/ld: main: hidden symbol `tracepoint_dlopen' in
common_tp.a(common_tp.o) is referenced by DSO when linking.  #3 gives
the error ./liblibrary.so: undefined reference to
`tracepoint_dlopen' when linking.  (Linking is complicated by the
fact that LTTng uses special symbol attributes, and tracepoint_dlopen
happens to be weak and hidden.)  Unless I'm mistaken (and I very well
may be), we will have to ensure that all traced code is either 1)
placed in a shared library and never statically linked elsewhere, or
2) never linked into any shared library.

Thoughts?

Adam

On Fri, Jul 25, 2014 at 11:48 AM, Adam Crume adamcr...@gmail.com wrote:
 LTTng requires tracepoints to be linked into a program only once.  If
 tracepoints are linked in multiple times, the program crashes at
 startup with: LTTng-UST: Error (-17) while registering tracepoint
 probe. Duplicate registration of tracepoint probes having the same
 name is not allowed.

 This is problematic when mixing static and dynamic linking.  If the
 tracepoints are in a static library, that library can end up in an
 executable multiple times by being linked in directly, as well as
 being statically linked into a dynamic library.  Even if the
 tracepoints are not linked directly into the executable, they can be
 statically linked into multiple dynamic libraries that the executable
 loads.

 For us, this problem shows up with libcommon, and could show up with
 others such as libosd.  (In general, I think anything added to
 noinst_LTLIBRARIES is static, and anything added to lib_LTLIBRARIES is
 dynamic.)

 There are a few ways of solving the issue:
 1. Change every library that has tracepoints, like libcommon, from
 static to dynamic.  This could be a big change, as at the very least
 we'd have to rename the library to something like libceph_common to
 avoid conflicts, since now it would be an installed file.  This has
 the advantage of keeping code and its tracepoints in the same library.
 2. Keep tracepoints in a separate static library.  For example,
 libcommon and libcommon_tp.  Unfortunately, every executable (but not
 library!) that links in libcommon (directly or indirectly) would have
 to manually link in libcommon_tp, and I don't think Automake could
 automate that.
 3. Keep tracepoints in a separate dynamic library.  In this case, I
 think libcommon could depend on libcommon_tp, so executables would not
 have to manually link in libcommon_tp.  (I'm not an Automake expert,
 so let me know if I'm wrong on that.)  Again, libcommon_tp would be an
 installed file, so we'd want to rename it to something like
 libceph_common_tp.

 I attached a minimal test case of the problem.

 Thoughts or suggestions?

 Thanks,
 Adam Crume


lttng-duplicate-tracepoints-sol1.tar.gz
Description: GNU Zip compressed data


lttng-duplicate-tracepoints-sol2.tar.gz
Description: GNU Zip compressed data


lttng-duplicate-tracepoints-sol3.tar.gz
Description: GNU Zip compressed data


Re: LTTng unfriendly with mixed static/dynamic linking

2014-07-25 Thread Sage Weil
On Fri, 25 Jul 2014, Adam Crume wrote:
 I tried all solutions, and it looks like only #1 works.  #2 gives the
 error /usr/bin/ld: main: hidden symbol `tracepoint_dlopen' in
 common_tp.a(common_tp.o) is referenced by DSO when linking.  #3 gives
 the error ./liblibrary.so: undefined reference to
 `tracepoint_dlopen' when linking.  (Linking is complicated by the
 fact that LTTng uses special symbol attributes, and tracepoint_dlopen
 happens to be weak and hidden.) 

I think #1 is good for other reasons, too.  We already have issues (I 
think!) with binaries that use librados and also link libcommon 
statically.  Specifically, I think we've seen that having mismatched 
versions of librados and the binary installed lead to confusion about the 
contents/structure of mdconfig_t (g_conf).  This is one of the reasons 
why the libcommon and rgw packages require an identical version of 
librados or librbd or whatever--to avoid this inconsistency.

 Unless I'm mistaken (and I very well
 may be), we will have to ensure that all traced code is either 1)
 placed in a shared library and never statically linked elsewhere, or
 2) never linked into any shared library.

That sounds doable and sane to me:

 - librados, librbd, libceph_common, etc. would have the tracepoints in 
the same .so
 - ceph-osd could have its own tracepoints, as long as they are always 
static.  (For example, libos.la, which is linked statically by ceph-mon 
and ceph-osd but never dynamically.)

One pain point in all of this, though, is that the libceph_common.so (or 
whatever) will need to go into a separate package that is required by 
librados.so and librbd and ceph-common and everything else.  'ceph-common' 
is what this ought to be called, but we've coopted it to mean 'ceph 
clients'.  I'm not sure it if it worthwhile to go through the hinjinx to 
rename ceph-common to ceph-clients and repurpose ceph-common for this?

sage



 
 Thoughts?
 
 Adam
 
 On Fri, Jul 25, 2014 at 11:48 AM, Adam Crume adamcr...@gmail.com wrote:
  LTTng requires tracepoints to be linked into a program only once.  If
  tracepoints are linked in multiple times, the program crashes at
  startup with: LTTng-UST: Error (-17) while registering tracepoint
  probe. Duplicate registration of tracepoint probes having the same
  name is not allowed.
 
  This is problematic when mixing static and dynamic linking.  If the
  tracepoints are in a static library, that library can end up in an
  executable multiple times by being linked in directly, as well as
  being statically linked into a dynamic library.  Even if the
  tracepoints are not linked directly into the executable, they can be
  statically linked into multiple dynamic libraries that the executable
  loads.
 
  For us, this problem shows up with libcommon, and could show up with
  others such as libosd.  (In general, I think anything added to
  noinst_LTLIBRARIES is static, and anything added to lib_LTLIBRARIES is
  dynamic.)
 
  There are a few ways of solving the issue:
  1. Change every library that has tracepoints, like libcommon, from
  static to dynamic.  This could be a big change, as at the very least
  we'd have to rename the library to something like libceph_common to
  avoid conflicts, since now it would be an installed file.  This has
  the advantage of keeping code and its tracepoints in the same library.
  2. Keep tracepoints in a separate static library.  For example,
  libcommon and libcommon_tp.  Unfortunately, every executable (but not
  library!) that links in libcommon (directly or indirectly) would have
  to manually link in libcommon_tp, and I don't think Automake could
  automate that.
  3. Keep tracepoints in a separate dynamic library.  In this case, I
  think libcommon could depend on libcommon_tp, so executables would not
  have to manually link in libcommon_tp.  (I'm not an Automake expert,
  so let me know if I'm wrong on that.)  Again, libcommon_tp would be an
  installed file, so we'd want to rename it to something like
  libceph_common_tp.
 
  I attached a minimal test case of the problem.
 
  Thoughts or suggestions?
 
  Thanks,
  Adam Crume
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


issue #8937

2014-07-25 Thread Yehuda Sadeh
I pushed a fix (which is not heavily tested) to wip-8937. The problem
seem to have been triggered by the fix to #8928. One thing that I
noticed was that previously max_chunk_size at that point was actually
used as the minimum write size (how appropriate). It's used elsewhere
as the max size for data drained off the front end.
Basically the fix for #8928 now makes max_chunk_size to limit the
rados writes to no more than that. Since up until now we weren't doing
it, then I'm not too sure that this change won't have significant
performance implications.
The options that we have at this point is:
 - keep it as it is now, max_chunk_size limits rados writes
 - revert it to the old behavior, but add some code to handle the pool
alignment when needed

In any case, I don't think this is ready to go into firefly.

Yehuda
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


wip-objecter

2014-07-25 Thread Sage Weil
Hey John,

I fixed up the mds osdmap wait stuff and squashed it into wip-objecter.  
That rebased out from underneath your branch; sorry.  Will try to look at 
the other patches shortly!

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html