RE: Question to RWLock reverse DNS ip=hostname

2014-09-10 Thread Andreas Joachim Peters
We did a lot of testing of RW locks in another multi-threaded project and we 
use successfully PTHREAD_RWLOCK_PREFER_WRITER_NP which works perfectly fine 
if you avoid recursive read locks. It is probably good practice to avoid any 
recursive mutex in general.

If one stay's with the default RW mutex without setting attributes we have seen 
that you starve writers for long periods if there is a contention on the read 
mutex (n-reader  n-writer). I agree this is a very 'sensitive' topic but 
could well be an origin for threading scalability problems - depends how and 
where you use the RW locks in CEPH.

Cheers Andreas.


From: Gregory Farnum [g...@inktank.com]
Sent: 09 September 2014 21:56
To: Sage Weil
Cc: Andreas Joachim Peters; ceph-devel@vger.kernel.org
Subject: Re: Question to RWLock  reverse DNS ip=hostname

On Tue, Sep 9, 2014 at 10:50 AM, Sage Weil sw...@redhat.com wrote:
 On Tue, 9 Sep 2014, Andreas Joachim Peters wrote:
 Hi,
 by chance I had a look to the RWLock class. To my best knowledge the way
 you create RW locks it defaults to writer-starvation e.g. all readers
 will always jump a head of a pending writer. I cannot imagine that you
 never have the opposite requirement in the CEPH multithreaded code but I
 didn't review where it is used. In case you are aware, you can just
 ignore this comment, otherwise one could add the option to create a
 writer-prefering RWLock at construction time using e.g.
 pthread_rwlockattr_setkind_np(attr, PTHREAD_RWLOCK_PREFER_WRITER_NP)

 Hmm, I was not aware.  Perhaps we should make the reader vs writer
 preference explicit during construction?

Yeah, this is interesting as I thought I'd learned somewhere that the
default behavior was to prefer writers. Poking around the internet
makes me wonder if we actually want to make any changes — best I can
tell PTHREAD_RWLOCK_PREFER_WRITER_NP does not actually behave the way
one wants (as part of the spec!) because pthreads requires recursive
locking, so we'd have to use
PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP. But we'd probably want
to profile the performance changes (and, uh, run tests to make sure we
don't deadlock somehow) before putting any such code upstream.

 Another question: I couldn't find a base implementation to translate
 IPV4/6 ip's to host names? Is there same configuration table/method
 allowing to get back from an OSD address retrieved via
 OSDMap::get_addr() the original host name avoiding the use of DNS
 lookups?

 I don't think we are doing reverse DNS lookups anywhere in the ceph
 code ...

In particular I think we've looked at doing this before and been
unable to reliably map to the *proper* hostname in cases where servers
have multiple NICs.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] libceph: gracefully handle large reply messages from the mon

2014-09-10 Thread Ilya Dryomov
From: Sage Weil s...@redhat.com

We preallocate a few of the message types we get back from the mon.  If we
get a larger message than we are expecting, fall back to trying to allocate
a new one instead of blindly using the one we have.

CC: sta...@vger.kernel.org
Signed-off-by: Sage Weil s...@redhat.com
---
 net/ceph/mon_client.c |8 
 1 file changed, 8 insertions(+)

diff --git a/net/ceph/mon_client.c b/net/ceph/mon_client.c
index 067d3af2eaf6..61fcfc304f68 100644
--- a/net/ceph/mon_client.c
+++ b/net/ceph/mon_client.c
@@ -1181,7 +1181,15 @@ static struct ceph_msg *mon_alloc_msg(struct 
ceph_connection *con,
if (!m) {
pr_info(alloc_msg unknown type %d\n, type);
*skip = 1;
+   } else if (front_len  m-front_alloc_len) {
+   pr_warning(mon_alloc_msg front %d  prealloc %d (%u#%llu)\n,
+  front_len, m-front_alloc_len,
+  (unsigned int)con-peer_name.type,
+  le64_to_cpu(con-peer_name.num));
+   ceph_msg_put(m);
+   m = ceph_msg_new(type, front_len, GFP_NOFS, false);
}
+
return m;
 }
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] libceph: add process_one_ticket() helper

2014-09-10 Thread Ilya Dryomov
Add a helper for processing individual cephx auth tickets.  Needed for
the next commit, that deals with allocating ticket buffers.  (Most of
the diff here is whitespace - view with git diff -b).

Cc: sta...@vger.kernel.org
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
 net/ceph/auth_x.c |  228 +
 1 file changed, 124 insertions(+), 104 deletions(-)

diff --git a/net/ceph/auth_x.c b/net/ceph/auth_x.c
index 96238ba95f2b..0eb146dce1aa 100644
--- a/net/ceph/auth_x.c
+++ b/net/ceph/auth_x.c
@@ -129,17 +129,131 @@ static void remove_ticket_handler(struct 
ceph_auth_client *ac,
kfree(th);
 }
 
+static int process_one_ticket(struct ceph_auth_client *ac,
+ struct ceph_crypto_key *secret,
+ void **p, void *end,
+ void *dbuf, void *ticket_buf)
+{
+   struct ceph_x_info *xi = ac-private;
+   int type;
+   u8 tkt_struct_v, blob_struct_v;
+   struct ceph_x_ticket_handler *th;
+   void *dp, *dend;
+   int dlen;
+   char is_enc;
+   struct timespec validity;
+   struct ceph_crypto_key old_key;
+   void *tp, *tpend;
+   struct ceph_timespec new_validity;
+   struct ceph_crypto_key new_session_key;
+   struct ceph_buffer *new_ticket_blob;
+   unsigned long new_expires, new_renew_after;
+   u64 new_secret_id;
+   int ret;
+
+   ceph_decode_need(p, end, sizeof(u32) + 1, bad);
+
+   type = ceph_decode_32(p);
+   dout( ticket type %d %s\n, type, ceph_entity_type_name(type));
+
+   tkt_struct_v = ceph_decode_8(p);
+   if (tkt_struct_v != 1)
+   goto bad;
+
+   th = get_ticket_handler(ac, type);
+   if (IS_ERR(th)) {
+   ret = PTR_ERR(th);
+   goto out;
+   }
+
+   /* blob for me */
+   dlen = ceph_x_decrypt(secret, p, end, dbuf,
+ TEMP_TICKET_BUF_LEN);
+   if (dlen = 0) {
+   ret = dlen;
+   goto out;
+   }
+   dout( decrypted %d bytes\n, dlen);
+   dp = dbuf;
+   dend = dp + dlen;
+
+   tkt_struct_v = ceph_decode_8(dp);
+   if (tkt_struct_v != 1)
+   goto bad;
+
+   memcpy(old_key, th-session_key, sizeof(old_key));
+   ret = ceph_crypto_key_decode(new_session_key, dp, dend);
+   if (ret)
+   goto out;
+
+   ceph_decode_copy(dp, new_validity, sizeof(new_validity));
+   ceph_decode_timespec(validity, new_validity);
+   new_expires = get_seconds() + validity.tv_sec;
+   new_renew_after = new_expires - (validity.tv_sec / 4);
+   dout( expires=%lu renew_after=%lu\n, new_expires,
+new_renew_after);
+
+   /* ticket blob for service */
+   ceph_decode_8_safe(p, end, is_enc, bad);
+   tp = ticket_buf;
+   if (is_enc) {
+   /* encrypted */
+   dout( encrypted ticket\n);
+   dlen = ceph_x_decrypt(old_key, p, end, ticket_buf,
+ TEMP_TICKET_BUF_LEN);
+   if (dlen  0) {
+   ret = dlen;
+   goto out;
+   }
+   dlen = ceph_decode_32(tp);
+   } else {
+   /* unencrypted */
+   ceph_decode_32_safe(p, end, dlen, bad);
+   ceph_decode_need(p, end, dlen, bad);
+   ceph_decode_copy(p, ticket_buf, dlen);
+   }
+   tpend = tp + dlen;
+   dout( ticket blob is %d bytes\n, dlen);
+   ceph_decode_need(tp, tpend, 1 + sizeof(u64), bad);
+   blob_struct_v = ceph_decode_8(tp);
+   new_secret_id = ceph_decode_64(tp);
+   ret = ceph_decode_buffer(new_ticket_blob, tp, tpend);
+   if (ret)
+   goto out;
+
+   /* all is well, update our ticket */
+   ceph_crypto_key_destroy(th-session_key);
+   if (th-ticket_blob)
+   ceph_buffer_put(th-ticket_blob);
+   th-session_key = new_session_key;
+   th-ticket_blob = new_ticket_blob;
+   th-validity = new_validity;
+   th-secret_id = new_secret_id;
+   th-expires = new_expires;
+   th-renew_after = new_renew_after;
+   dout( got ticket service %d (%s) secret_id %lld len %d\n,
+type, ceph_entity_type_name(type), th-secret_id,
+(int)th-ticket_blob-vec.iov_len);
+   xi-have_keys |= th-service;
+
+out:
+   return ret;
+
+bad:
+   ret = -EINVAL;
+   goto out;
+}
+
 static int ceph_x_proc_ticket_reply(struct ceph_auth_client *ac,
struct ceph_crypto_key *secret,
void *buf, void *end)
 {
-   struct ceph_x_info *xi = ac-private;
-   int num;
void *p = buf;
-   int ret;
char *dbuf;
char *ticket_buf;
u8 reply_struct_v;
+   u32 num;
+   int ret;
 
dbuf = kmalloc(TEMP_TICKET_BUF_LEN, GFP_NOFS);
if (!dbuf)
@@ -150,112 

[PATCH 3/3] libceph: do not hard code max auth ticket len

2014-09-10 Thread Ilya Dryomov
We hard code cephx auth ticket buffer size to 256 bytes.  This isn't
enough for any moderate setups and, in case tickets themselves are not
encrypted, leads to buffer overflows (ceph_x_decrypt() errors out, but
ceph_decode_copy() doesn't - it's just a memcpy() wrapper).  Since the
buffer is allocated dynamically anyway, allocated it a bit later, at
the point where we know how much is going to be needed.

Fixes: http://tracker.ceph.com/issues/8979

Cc: sta...@vger.kernel.org
Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
---
 net/ceph/auth_x.c |   64 -
 1 file changed, 29 insertions(+), 35 deletions(-)

diff --git a/net/ceph/auth_x.c b/net/ceph/auth_x.c
index 0eb146dce1aa..de6662b14e1f 100644
--- a/net/ceph/auth_x.c
+++ b/net/ceph/auth_x.c
@@ -13,8 +13,6 @@
 #include auth_x.h
 #include auth_x_protocol.h
 
-#define TEMP_TICKET_BUF_LEN256
-
 static void ceph_x_validate_tickets(struct ceph_auth_client *ac, int *pneed);
 
 static int ceph_x_is_authenticated(struct ceph_auth_client *ac)
@@ -64,7 +62,7 @@ static int ceph_x_encrypt(struct ceph_crypto_key *secret,
 }
 
 static int ceph_x_decrypt(struct ceph_crypto_key *secret,
- void **p, void *end, void *obuf, size_t olen)
+ void **p, void *end, void **obuf, size_t olen)
 {
struct ceph_x_encrypt_header head;
size_t head_len = sizeof(head);
@@ -75,8 +73,14 @@ static int ceph_x_decrypt(struct ceph_crypto_key *secret,
return -EINVAL;
 
dout(ceph_x_decrypt len %d\n, len);
-   ret = ceph_decrypt2(secret, head, head_len, obuf, olen,
-   *p, len);
+   if (*obuf == NULL) {
+   *obuf = kmalloc(len, GFP_NOFS);
+   if (!*obuf)
+   return -ENOMEM;
+   olen = len;
+   }
+
+   ret = ceph_decrypt2(secret, head, head_len, *obuf, olen, *p, len);
if (ret)
return ret;
if (head.struct_v != 1 || le64_to_cpu(head.magic) != CEPHX_ENC_MAGIC)
@@ -131,18 +135,19 @@ static void remove_ticket_handler(struct ceph_auth_client 
*ac,
 
 static int process_one_ticket(struct ceph_auth_client *ac,
  struct ceph_crypto_key *secret,
- void **p, void *end,
- void *dbuf, void *ticket_buf)
+ void **p, void *end)
 {
struct ceph_x_info *xi = ac-private;
int type;
u8 tkt_struct_v, blob_struct_v;
struct ceph_x_ticket_handler *th;
+   void *dbuf = NULL;
void *dp, *dend;
int dlen;
char is_enc;
struct timespec validity;
struct ceph_crypto_key old_key;
+   void *ticket_buf = NULL;
void *tp, *tpend;
struct ceph_timespec new_validity;
struct ceph_crypto_key new_session_key;
@@ -167,8 +172,7 @@ static int process_one_ticket(struct ceph_auth_client *ac,
}
 
/* blob for me */
-   dlen = ceph_x_decrypt(secret, p, end, dbuf,
- TEMP_TICKET_BUF_LEN);
+   dlen = ceph_x_decrypt(secret, p, end, dbuf, 0);
if (dlen = 0) {
ret = dlen;
goto out;
@@ -195,20 +199,25 @@ static int process_one_ticket(struct ceph_auth_client *ac,
 
/* ticket blob for service */
ceph_decode_8_safe(p, end, is_enc, bad);
-   tp = ticket_buf;
if (is_enc) {
/* encrypted */
dout( encrypted ticket\n);
-   dlen = ceph_x_decrypt(old_key, p, end, ticket_buf,
- TEMP_TICKET_BUF_LEN);
+   dlen = ceph_x_decrypt(old_key, p, end, ticket_buf, 0);
if (dlen  0) {
ret = dlen;
goto out;
}
+   tp = ticket_buf;
dlen = ceph_decode_32(tp);
} else {
/* unencrypted */
ceph_decode_32_safe(p, end, dlen, bad);
+   ticket_buf = kmalloc(dlen, GFP_NOFS);
+   if (!ticket_buf) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   tp = ticket_buf;
ceph_decode_need(p, end, dlen, bad);
ceph_decode_copy(p, ticket_buf, dlen);
}
@@ -237,6 +246,8 @@ static int process_one_ticket(struct ceph_auth_client *ac,
xi-have_keys |= th-service;
 
 out:
+   kfree(ticket_buf);
+   kfree(dbuf);
return ret;
 
 bad:
@@ -249,21 +260,10 @@ static int ceph_x_proc_ticket_reply(struct 
ceph_auth_client *ac,
void *buf, void *end)
 {
void *p = buf;
-   char *dbuf;
-   char *ticket_buf;
u8 reply_struct_v;
u32 num;
int ret;
 
-   dbuf = kmalloc(TEMP_TICKET_BUF_LEN, GFP_NOFS);
-   if (!dbuf)
-   return -ENOMEM;
-
-   ret = -ENOMEM;
- 

Re: [PATCH] libceph: fix a use after free issue in osdmap_set_max_osd

2014-09-10 Thread Ilya Dryomov
On Sun, Sep 7, 2014 at 6:27 PM, Ilya Dryomov ilya.dryo...@inktank.com wrote:
 On Sun, Sep 7, 2014 at 2:10 PM,  roy.qing...@gmail.com wrote:
 From: Li RongQing roy.qing...@gmail.com

 If the state variable is krealloced successfully, map-osd_state will be
 freed, once following two reallocation failed, and exit the function
 without resetting map-osd_state, map-osd_state become a wild pointer.

 fix it by resetting them after krealloc successfully.

 Signed-off-by: Li RongQing roy.qing...@gmail.com
 ---
  net/ceph/osdmap.c |   20 ++--
  1 file changed, 10 insertions(+), 10 deletions(-)

 diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c
 index c547e46..81e9c66 100644
 --- a/net/ceph/osdmap.c
 +++ b/net/ceph/osdmap.c
 @@ -671,15 +671,19 @@ static int osdmap_set_max_osd(struct ceph_osdmap *map, 
 int max)
 int i;

 state = krealloc(map-osd_state, max*sizeof(*state), GFP_NOFS);
 +   if (!state)
 +   return -ENOMEM;
 +   map-osd_state = state;
 +
 weight = krealloc(map-osd_weight, max*sizeof(*weight), GFP_NOFS);
 -   addr = krealloc(map-osd_addr, max*sizeof(*addr), GFP_NOFS);
 -   if (!state || !weight || !addr) {
 -   kfree(state);
 -   kfree(weight);
 -   kfree(addr);
 +   if (!weight)
 +   return -ENOMEM;
 +   map-osd_weight = weight;

 +   addr = krealloc(map-osd_addr, max*sizeof(*addr), GFP_NOFS);
 +   if (!addr)
 return -ENOMEM;
 -   }
 +   map-osd_addr = addr;

 for (i = map-max_osd; i  max; i++) {
 state[i] = 0;
 @@ -687,10 +691,6 @@ static int osdmap_set_max_osd(struct ceph_osdmap *map, 
 int max)
 memset(addr + i, 0, sizeof(*addr));
 }

 -   map-osd_state = state;
 -   map-osd_weight = weight;
 -   map-osd_addr = addr;
 -
 if (map-osd_primary_affinity) {
 u32 *affinity;

 --
 1.7.10.4

 Looks good.  I'll apply it tomorrow.

Pushed a branch with your patch.  Minor modifications: use map-*
instead of local variables for initialization and change primary
affinity case so it fits in.  You can take a look at

https://github.com/ceph/ceph-client/commit/790415a024871a2100388ce4b3d485756fb90865

Thanks,

Ilya
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ceph: trim unused inodes before reconnecting to recovering MDS

2014-09-10 Thread Yan, Zheng
So the recovering MDS does not need to fetch these ununsed inodes during
cache rejoin. This may reduce MDS recovery time.

Signed-off-by: Yan, Zheng z...@redhat.com
---
 fs/ceph/mds_client.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index bad07c0..f751fea 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2695,16 +2695,6 @@ static void send_mds_reconnect(struct ceph_mds_client 
*mdsc,
session-s_state = CEPH_MDS_SESSION_RECONNECTING;
session-s_seq = 0;
 
-   ceph_con_close(session-s_con);
-   ceph_con_open(session-s_con,
- CEPH_ENTITY_TYPE_MDS, mds,
- ceph_mdsmap_get_addr(mdsc-mdsmap, mds));
-
-   /* replay unsafe requests */
-   replay_unsafe_requests(mdsc, session);
-
-   down_read(mdsc-snap_rwsem);
-
dout(session %p state %s\n, session,
 session_state_name(session-s_state));
 
@@ -2723,6 +2713,19 @@ static void send_mds_reconnect(struct ceph_mds_client 
*mdsc,
discard_cap_releases(mdsc, session);
spin_unlock(session-s_cap_lock);
 
+   /* trim unused caps to reduce MDS's cache rejoin time */
+   shrink_dcache_parent(mdsc-fsc-sb-s_root);
+
+   ceph_con_close(session-s_con);
+   ceph_con_open(session-s_con,
+ CEPH_ENTITY_TYPE_MDS, mds,
+ ceph_mdsmap_get_addr(mdsc-mdsmap, mds));
+
+   /* replay unsafe requests */
+   replay_unsafe_requests(mdsc, session);
+
+   down_read(mdsc-snap_rwsem);
+
/* traverse this session's caps */
s_nr_caps = session-s_nr_caps;
err = ceph_pagelist_encode_32(pagelist, s_nr_caps);
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] question about RGW

2014-09-10 Thread Sage Weil
[Moving this to ceph-devel, where you're more likely to get a response 
from a developer!]

On Wed, 10 Sep 2014, baijia...@126.com wrote:

 when I read RGW code,  and can't  understand  master_ver  inside struct
 rgw_bucket_dir_header .
 who can explain this struct , in especial master_ver and stats , thanks
  
 
 
 baijia...@126.com
 
 

Re: [PATCH net-next 2/5] ceph: Convert pr_warning to pr_warn

2014-09-10 Thread Ilya Dryomov
On Wed, Sep 10, 2014 at 8:17 AM, Joe Perches j...@perches.com wrote:
 Use the more common pr_warn.

 Other miscellanea:

 o Coalesce formats
 o Realign arguments

 Signed-off-by: Joe Perches j...@perches.com
 ---
  net/ceph/ceph_common.c | 15 +--
  net/ceph/messenger.c   | 19 ++-
  net/ceph/osd_client.c  | 15 +++
  net/ceph/osdmap.c  | 20 ++--
  4 files changed, 36 insertions(+), 33 deletions(-)

 diff --git a/net/ceph/ceph_common.c b/net/ceph/ceph_common.c
 index 1675021..58fbfe1 100644
 --- a/net/ceph/ceph_common.c
 +++ b/net/ceph/ceph_common.c
 @@ -293,17 +293,20 @@ static int get_secret(struct ceph_crypto_key *dst, 
 const char *name) {
 key_err = PTR_ERR(ukey);
 switch (key_err) {
 case -ENOKEY:
 -   pr_warning(ceph: Mount failed due to key not found: 
 %s\n, name);
 +   pr_warn(ceph: Mount failed due to key not found: 
 %s\n,
 +   name);
 break;
 case -EKEYEXPIRED:
 -   pr_warning(ceph: Mount failed due to expired key: 
 %s\n, name);
 +   pr_warn(ceph: Mount failed due to expired key: %s\n,
 +   name);
 break;
 case -EKEYREVOKED:
 -   pr_warning(ceph: Mount failed due to revoked key: 
 %s\n, name);
 +   pr_warn(ceph: Mount failed due to revoked key: %s\n,
 +   name);
 break;
 default:
 -   pr_warning(ceph: Mount failed due to unknown key 
 error
 -   %d: %s\n, key_err, name);
 +   pr_warn(ceph: Mount failed due to unknown key error 
 %d: %s\n,
 +   key_err, name);
 }
 err = -EPERM;
 goto out;
 @@ -433,7 +436,7 @@ ceph_parse_options(char *options, const char *dev_name,

 /* misc */
 case Opt_osdtimeout:
 -   pr_warning(ignoring deprecated osdtimeout option\n);
 +   pr_warn(ignoring deprecated osdtimeout option\n);
 break;
 case Opt_osdkeepalivetimeout:
 opt-osd_keepalive_timeout = intval;
 diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
 index b2f571d..a7203be 100644
 --- a/net/ceph/messenger.c
 +++ b/net/ceph/messenger.c
 @@ -1937,11 +1937,11 @@ static int process_banner(struct ceph_connection *con)
sizeof(con-peer_addr)) != 0 
 !(addr_is_blank(con-actual_peer_addr.in_addr) 
   con-actual_peer_addr.nonce == con-peer_addr.nonce)) {
 -   pr_warning(wrong peer, want %s/%d, got %s/%d\n,
 -  ceph_pr_addr(con-peer_addr.in_addr),
 -  (int)le32_to_cpu(con-peer_addr.nonce),
 -  ceph_pr_addr(con-actual_peer_addr.in_addr),
 -  (int)le32_to_cpu(con-actual_peer_addr.nonce));
 +   pr_warn(wrong peer, want %s/%d, got %s/%d\n,
 +   ceph_pr_addr(con-peer_addr.in_addr),
 +   (int)le32_to_cpu(con-peer_addr.nonce),
 +   ceph_pr_addr(con-actual_peer_addr.in_addr),
 +   (int)le32_to_cpu(con-actual_peer_addr.nonce));
 con-error_msg = wrong peer at address;
 return -1;
 }
 @@ -2302,7 +2302,7 @@ static int read_partial_message(struct ceph_connection 
 *con)

 BUG_ON(!con-in_msg ^ skip);
 if (con-in_msg  data_len  con-in_msg-data_length) {
 -   pr_warning(%s skipping long message (%u  %zd)\n,
 +   pr_warn(%s skipping long message (%u  %zd)\n,
 __func__, data_len, con-in_msg-data_length);
 ceph_msg_put(con-in_msg);
 con-in_msg = NULL;
 @@ -2712,7 +2712,7 @@ static bool con_sock_closed(struct ceph_connection *con)
 CASE(OPEN);
 CASE(STANDBY);
 default:
 -   pr_warning(%s con %p unrecognized state %lu\n,
 +   pr_warn(%s con %p unrecognized state %lu\n,
 __func__, con, con-state);
 con-error_msg = unrecognized con state;
 BUG();
 @@ -2828,8 +2828,9 @@ static void con_work(struct work_struct *work)
   */
  static void con_fault(struct ceph_connection *con)
  {
 -   pr_warning(%s%lld %s %s\n, ENTITY_NAME(con-peer_name),
 -  ceph_pr_addr(con-peer_addr.in_addr), con-error_msg);
 +   pr_warn(%s%lld %s %s\n,
 +   ENTITY_NAME(con-peer_name),
 +   ceph_pr_addr(con-peer_addr.in_addr), con-error_msg);
 dout(fault %p state %lu to peer %s\n,

Re: [PATCH net-next 2/5] ceph: Convert pr_warning to pr_warn

2014-09-10 Thread Ilya Dryomov
On Wed, Sep 10, 2014 at 8:17 AM, Joe Perches j...@perches.com wrote:
 Use the more common pr_warn.

 Other miscellanea:

 o Coalesce formats
 o Realign arguments

 Signed-off-by: Joe Perches j...@perches.com
 ---
  net/ceph/ceph_common.c | 15 +--
  net/ceph/messenger.c   | 19 ++-
  net/ceph/osd_client.c  | 15 +++
  net/ceph/osdmap.c  | 20 ++--
  4 files changed, 36 insertions(+), 33 deletions(-)

 diff --git a/net/ceph/ceph_common.c b/net/ceph/ceph_common.c
 index 1675021..58fbfe1 100644
 --- a/net/ceph/ceph_common.c
 +++ b/net/ceph/ceph_common.c
 @@ -293,17 +293,20 @@ static int get_secret(struct ceph_crypto_key *dst, 
 const char *name) {
 key_err = PTR_ERR(ukey);
 switch (key_err) {
 case -ENOKEY:
 -   pr_warning(ceph: Mount failed due to key not found: 
 %s\n, name);
 +   pr_warn(ceph: Mount failed due to key not found: 
 %s\n,
 +   name);
 break;
 case -EKEYEXPIRED:
 -   pr_warning(ceph: Mount failed due to expired key: 
 %s\n, name);
 +   pr_warn(ceph: Mount failed due to expired key: %s\n,
 +   name);
 break;
 case -EKEYREVOKED:
 -   pr_warning(ceph: Mount failed due to revoked key: 
 %s\n, name);
 +   pr_warn(ceph: Mount failed due to revoked key: %s\n,
 +   name);
 break;
 default:
 -   pr_warning(ceph: Mount failed due to unknown key 
 error
 -   %d: %s\n, key_err, name);
 +   pr_warn(ceph: Mount failed due to unknown key error 
 %d: %s\n,
 +   key_err, name);
 }
 err = -EPERM;
 goto out;
 @@ -433,7 +436,7 @@ ceph_parse_options(char *options, const char *dev_name,

 /* misc */
 case Opt_osdtimeout:
 -   pr_warning(ignoring deprecated osdtimeout option\n);
 +   pr_warn(ignoring deprecated osdtimeout option\n);
 break;
 case Opt_osdkeepalivetimeout:
 opt-osd_keepalive_timeout = intval;
 diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
 index b2f571d..a7203be 100644
 --- a/net/ceph/messenger.c
 +++ b/net/ceph/messenger.c
 @@ -1937,11 +1937,11 @@ static int process_banner(struct ceph_connection *con)
sizeof(con-peer_addr)) != 0 
 !(addr_is_blank(con-actual_peer_addr.in_addr) 
   con-actual_peer_addr.nonce == con-peer_addr.nonce)) {
 -   pr_warning(wrong peer, want %s/%d, got %s/%d\n,
 -  ceph_pr_addr(con-peer_addr.in_addr),
 -  (int)le32_to_cpu(con-peer_addr.nonce),
 -  ceph_pr_addr(con-actual_peer_addr.in_addr),
 -  (int)le32_to_cpu(con-actual_peer_addr.nonce));
 +   pr_warn(wrong peer, want %s/%d, got %s/%d\n,
 +   ceph_pr_addr(con-peer_addr.in_addr),
 +   (int)le32_to_cpu(con-peer_addr.nonce),
 +   ceph_pr_addr(con-actual_peer_addr.in_addr),
 +   (int)le32_to_cpu(con-actual_peer_addr.nonce));
 con-error_msg = wrong peer at address;
 return -1;
 }
 @@ -2302,7 +2302,7 @@ static int read_partial_message(struct ceph_connection 
 *con)

 BUG_ON(!con-in_msg ^ skip);
 if (con-in_msg  data_len  con-in_msg-data_length) {
 -   pr_warning(%s skipping long message (%u  %zd)\n,
 +   pr_warn(%s skipping long message (%u  %zd)\n,
 __func__, data_len, con-in_msg-data_length);
 ceph_msg_put(con-in_msg);
 con-in_msg = NULL;
 @@ -2712,7 +2712,7 @@ static bool con_sock_closed(struct ceph_connection *con)
 CASE(OPEN);
 CASE(STANDBY);
 default:
 -   pr_warning(%s con %p unrecognized state %lu\n,
 +   pr_warn(%s con %p unrecognized state %lu\n,
 __func__, con, con-state);
 con-error_msg = unrecognized con state;
 BUG();
 @@ -2828,8 +2828,9 @@ static void con_work(struct work_struct *work)
   */
  static void con_fault(struct ceph_connection *con)
  {
 -   pr_warning(%s%lld %s %s\n, ENTITY_NAME(con-peer_name),
 -  ceph_pr_addr(con-peer_addr.in_addr), con-error_msg);
 +   pr_warn(%s%lld %s %s\n,
 +   ENTITY_NAME(con-peer_name),
 +   ceph_pr_addr(con-peer_addr.in_addr), con-error_msg);

Just wondering, why did you move ENITITY_NAME 

RE: Question to RWLock reverse DNS ip=hostname

2014-09-10 Thread Sage Weil
On Wed, 10 Sep 2014, Andreas Joachim Peters wrote:
 We did a lot of testing of RW locks in another multi-threaded project 
 and we use successfully PTHREAD_RWLOCK_PREFER_WRITER_NP which works 
 perfectly fine if you avoid recursive read locks. It is probably good 
 practice to avoid any recursive mutex in general.

The good news is that there are (I believe) no recursive uses of the 
mutexes or rwlocks in the code base.  Our lockdep dependency checking 
system throws errors if it detects any recursion.  There might be a few 
exceptions that are explicitly marked recursive via the constructor.  
 
 If one stay's with the default RW mutex without setting attributes we 
 have seen that you starve writers for long periods if there is a 
 contention on the read mutex (n-reader  n-writer). I agree this is a 
 very 'sensitive' topic but could well be an origin for threading 
 scalability problems - depends how and where you use the RW locks in 
 CEPH.

Yeah!

I suspect the way to approach this would be to make reader vs writer 
favortism explicit in the constructor arguments, which no change in 
behavior, and then a series of patches that switch to writer-favortism...

sage

 
 Cheers Andreas.
 
 
 From: Gregory Farnum [g...@inktank.com]
 Sent: 09 September 2014 21:56
 To: Sage Weil
 Cc: Andreas Joachim Peters; ceph-devel@vger.kernel.org
 Subject: Re: Question to RWLock  reverse DNS ip=hostname
 
 On Tue, Sep 9, 2014 at 10:50 AM, Sage Weil sw...@redhat.com wrote:
  On Tue, 9 Sep 2014, Andreas Joachim Peters wrote:
  Hi,
  by chance I had a look to the RWLock class. To my best knowledge the way
  you create RW locks it defaults to writer-starvation e.g. all readers
  will always jump a head of a pending writer. I cannot imagine that you
  never have the opposite requirement in the CEPH multithreaded code but I
  didn't review where it is used. In case you are aware, you can just
  ignore this comment, otherwise one could add the option to create a
  writer-prefering RWLock at construction time using e.g.
  pthread_rwlockattr_setkind_np(attr, PTHREAD_RWLOCK_PREFER_WRITER_NP)
 
  Hmm, I was not aware.  Perhaps we should make the reader vs writer
  preference explicit during construction?
 
 Yeah, this is interesting as I thought I'd learned somewhere that the
 default behavior was to prefer writers. Poking around the internet
 makes me wonder if we actually want to make any changes ? best I can
 tell PTHREAD_RWLOCK_PREFER_WRITER_NP does not actually behave the way
 one wants (as part of the spec!) because pthreads requires recursive
 locking, so we'd have to use
 PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP. But we'd probably want
 to profile the performance changes (and, uh, run tests to make sure we
 don't deadlock somehow) before putting any such code upstream.
 
  Another question: I couldn't find a base implementation to translate
  IPV4/6 ip's to host names? Is there same configuration table/method
  allowing to get back from an OSD address retrieved via
  OSDMap::get_addr() the original host name avoiding the use of DNS
  lookups?
 
  I don't think we are doing reverse DNS lookups anywhere in the ceph
  code ...
 
 In particular I think we've looked at doing this before and been
 unable to reliably map to the *proper* hostname in cases where servers
 have multiple NICs.
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] libceph: gracefully handle large reply messages from the mon

2014-09-10 Thread Sage Weil
Reviewed-by:

On Wed, 10 Sep 2014, Ilya Dryomov wrote:

 From: Sage Weil s...@redhat.com
 
 We preallocate a few of the message types we get back from the mon.  If we
 get a larger message than we are expecting, fall back to trying to allocate
 a new one instead of blindly using the one we have.
 
 CC: sta...@vger.kernel.org
 Signed-off-by: Sage Weil s...@redhat.com
 ---
  net/ceph/mon_client.c |8 
  1 file changed, 8 insertions(+)
 
 diff --git a/net/ceph/mon_client.c b/net/ceph/mon_client.c
 index 067d3af2eaf6..61fcfc304f68 100644
 --- a/net/ceph/mon_client.c
 +++ b/net/ceph/mon_client.c
 @@ -1181,7 +1181,15 @@ static struct ceph_msg *mon_alloc_msg(struct 
 ceph_connection *con,
   if (!m) {
   pr_info(alloc_msg unknown type %d\n, type);
   *skip = 1;
 + } else if (front_len  m-front_alloc_len) {
 + pr_warning(mon_alloc_msg front %d  prealloc %d (%u#%llu)\n,
 +front_len, m-front_alloc_len,
 +(unsigned int)con-peer_name.type,
 +le64_to_cpu(con-peer_name.num));
 + ceph_msg_put(m);
 + m = ceph_msg_new(type, front_len, GFP_NOFS, false);
   }
 +
   return m;
  }
  
 -- 
 1.7.10.4
 
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] libceph: add process_one_ticket() helper

2014-09-10 Thread Sage Weil
Reviewed-by:

On Wed, 10 Sep 2014, Ilya Dryomov wrote:

 Add a helper for processing individual cephx auth tickets.  Needed for
 the next commit, that deals with allocating ticket buffers.  (Most of
 the diff here is whitespace - view with git diff -b).
 
 Cc: sta...@vger.kernel.org
 Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com
 ---
  net/ceph/auth_x.c |  228 
 +
  1 file changed, 124 insertions(+), 104 deletions(-)
 
 diff --git a/net/ceph/auth_x.c b/net/ceph/auth_x.c
 index 96238ba95f2b..0eb146dce1aa 100644
 --- a/net/ceph/auth_x.c
 +++ b/net/ceph/auth_x.c
 @@ -129,17 +129,131 @@ static void remove_ticket_handler(struct 
 ceph_auth_client *ac,
   kfree(th);
  }
  
 +static int process_one_ticket(struct ceph_auth_client *ac,
 +   struct ceph_crypto_key *secret,
 +   void **p, void *end,
 +   void *dbuf, void *ticket_buf)
 +{
 + struct ceph_x_info *xi = ac-private;
 + int type;
 + u8 tkt_struct_v, blob_struct_v;
 + struct ceph_x_ticket_handler *th;
 + void *dp, *dend;
 + int dlen;
 + char is_enc;
 + struct timespec validity;
 + struct ceph_crypto_key old_key;
 + void *tp, *tpend;
 + struct ceph_timespec new_validity;
 + struct ceph_crypto_key new_session_key;
 + struct ceph_buffer *new_ticket_blob;
 + unsigned long new_expires, new_renew_after;
 + u64 new_secret_id;
 + int ret;
 +
 + ceph_decode_need(p, end, sizeof(u32) + 1, bad);
 +
 + type = ceph_decode_32(p);
 + dout( ticket type %d %s\n, type, ceph_entity_type_name(type));
 +
 + tkt_struct_v = ceph_decode_8(p);
 + if (tkt_struct_v != 1)
 + goto bad;
 +
 + th = get_ticket_handler(ac, type);
 + if (IS_ERR(th)) {
 + ret = PTR_ERR(th);
 + goto out;
 + }
 +
 + /* blob for me */
 + dlen = ceph_x_decrypt(secret, p, end, dbuf,
 +   TEMP_TICKET_BUF_LEN);
 + if (dlen = 0) {
 + ret = dlen;
 + goto out;
 + }
 + dout( decrypted %d bytes\n, dlen);
 + dp = dbuf;
 + dend = dp + dlen;
 +
 + tkt_struct_v = ceph_decode_8(dp);
 + if (tkt_struct_v != 1)
 + goto bad;
 +
 + memcpy(old_key, th-session_key, sizeof(old_key));
 + ret = ceph_crypto_key_decode(new_session_key, dp, dend);
 + if (ret)
 + goto out;
 +
 + ceph_decode_copy(dp, new_validity, sizeof(new_validity));
 + ceph_decode_timespec(validity, new_validity);
 + new_expires = get_seconds() + validity.tv_sec;
 + new_renew_after = new_expires - (validity.tv_sec / 4);
 + dout( expires=%lu renew_after=%lu\n, new_expires,
 +  new_renew_after);
 +
 + /* ticket blob for service */
 + ceph_decode_8_safe(p, end, is_enc, bad);
 + tp = ticket_buf;
 + if (is_enc) {
 + /* encrypted */
 + dout( encrypted ticket\n);
 + dlen = ceph_x_decrypt(old_key, p, end, ticket_buf,
 +   TEMP_TICKET_BUF_LEN);
 + if (dlen  0) {
 + ret = dlen;
 + goto out;
 + }
 + dlen = ceph_decode_32(tp);
 + } else {
 + /* unencrypted */
 + ceph_decode_32_safe(p, end, dlen, bad);
 + ceph_decode_need(p, end, dlen, bad);
 + ceph_decode_copy(p, ticket_buf, dlen);
 + }
 + tpend = tp + dlen;
 + dout( ticket blob is %d bytes\n, dlen);
 + ceph_decode_need(tp, tpend, 1 + sizeof(u64), bad);
 + blob_struct_v = ceph_decode_8(tp);
 + new_secret_id = ceph_decode_64(tp);
 + ret = ceph_decode_buffer(new_ticket_blob, tp, tpend);
 + if (ret)
 + goto out;
 +
 + /* all is well, update our ticket */
 + ceph_crypto_key_destroy(th-session_key);
 + if (th-ticket_blob)
 + ceph_buffer_put(th-ticket_blob);
 + th-session_key = new_session_key;
 + th-ticket_blob = new_ticket_blob;
 + th-validity = new_validity;
 + th-secret_id = new_secret_id;
 + th-expires = new_expires;
 + th-renew_after = new_renew_after;
 + dout( got ticket service %d (%s) secret_id %lld len %d\n,
 +  type, ceph_entity_type_name(type), th-secret_id,
 +  (int)th-ticket_blob-vec.iov_len);
 + xi-have_keys |= th-service;
 +
 +out:
 + return ret;
 +
 +bad:
 + ret = -EINVAL;
 + goto out;
 +}
 +
  static int ceph_x_proc_ticket_reply(struct ceph_auth_client *ac,
   struct ceph_crypto_key *secret,
   void *buf, void *end)
  {
 - struct ceph_x_info *xi = ac-private;
 - int num;
   void *p = buf;
 - int ret;
   char *dbuf;
   char *ticket_buf;
   u8 reply_struct_v;
 + u32 num;
 + int ret;
  
   dbuf = kmalloc(TEMP_TICKET_BUF_LEN, GFP_NOFS);
   if (!dbuf)
 @@ -150,112 +264,18 @@ 

Re: [PATCH 3/3] libceph: do not hard code max auth ticket len

2014-09-10 Thread Sage Weil
On Wed, 10 Sep 2014, Ilya Dryomov wrote:
 We hard code cephx auth ticket buffer size to 256 bytes.  This isn't
 enough for any moderate setups and, in case tickets themselves are not
 encrypted, leads to buffer overflows (ceph_x_decrypt() errors out, but
 ceph_decode_copy() doesn't - it's just a memcpy() wrapper).  Since the
 buffer is allocated dynamically anyway, allocated it a bit later, at
 the point where we know how much is going to be needed.
 
 Fixes: http://tracker.ceph.com/issues/8979
 
 Cc: sta...@vger.kernel.org
 Signed-off-by: Ilya Dryomov ilya.dryo...@inktank.com

Reviewed-by: Sage Weil s...@redhat.com

 ---
  net/ceph/auth_x.c |   64 
 -
  1 file changed, 29 insertions(+), 35 deletions(-)
 
 diff --git a/net/ceph/auth_x.c b/net/ceph/auth_x.c
 index 0eb146dce1aa..de6662b14e1f 100644
 --- a/net/ceph/auth_x.c
 +++ b/net/ceph/auth_x.c
 @@ -13,8 +13,6 @@
  #include auth_x.h
  #include auth_x_protocol.h
  
 -#define TEMP_TICKET_BUF_LEN  256
 -
  static void ceph_x_validate_tickets(struct ceph_auth_client *ac, int *pneed);
  
  static int ceph_x_is_authenticated(struct ceph_auth_client *ac)
 @@ -64,7 +62,7 @@ static int ceph_x_encrypt(struct ceph_crypto_key *secret,
  }
  
  static int ceph_x_decrypt(struct ceph_crypto_key *secret,
 -   void **p, void *end, void *obuf, size_t olen)
 +   void **p, void *end, void **obuf, size_t olen)
  {
   struct ceph_x_encrypt_header head;
   size_t head_len = sizeof(head);
 @@ -75,8 +73,14 @@ static int ceph_x_decrypt(struct ceph_crypto_key *secret,
   return -EINVAL;
  
   dout(ceph_x_decrypt len %d\n, len);
 - ret = ceph_decrypt2(secret, head, head_len, obuf, olen,
 - *p, len);
 + if (*obuf == NULL) {
 + *obuf = kmalloc(len, GFP_NOFS);
 + if (!*obuf)
 + return -ENOMEM;
 + olen = len;
 + }
 +
 + ret = ceph_decrypt2(secret, head, head_len, *obuf, olen, *p, len);
   if (ret)
   return ret;
   if (head.struct_v != 1 || le64_to_cpu(head.magic) != CEPHX_ENC_MAGIC)
 @@ -131,18 +135,19 @@ static void remove_ticket_handler(struct 
 ceph_auth_client *ac,
  
  static int process_one_ticket(struct ceph_auth_client *ac,
 struct ceph_crypto_key *secret,
 -   void **p, void *end,
 -   void *dbuf, void *ticket_buf)
 +   void **p, void *end)
  {
   struct ceph_x_info *xi = ac-private;
   int type;
   u8 tkt_struct_v, blob_struct_v;
   struct ceph_x_ticket_handler *th;
 + void *dbuf = NULL;
   void *dp, *dend;
   int dlen;
   char is_enc;
   struct timespec validity;
   struct ceph_crypto_key old_key;
 + void *ticket_buf = NULL;
   void *tp, *tpend;
   struct ceph_timespec new_validity;
   struct ceph_crypto_key new_session_key;
 @@ -167,8 +172,7 @@ static int process_one_ticket(struct ceph_auth_client *ac,
   }
  
   /* blob for me */
 - dlen = ceph_x_decrypt(secret, p, end, dbuf,
 -   TEMP_TICKET_BUF_LEN);
 + dlen = ceph_x_decrypt(secret, p, end, dbuf, 0);
   if (dlen = 0) {
   ret = dlen;
   goto out;
 @@ -195,20 +199,25 @@ static int process_one_ticket(struct ceph_auth_client 
 *ac,
  
   /* ticket blob for service */
   ceph_decode_8_safe(p, end, is_enc, bad);
 - tp = ticket_buf;
   if (is_enc) {
   /* encrypted */
   dout( encrypted ticket\n);
 - dlen = ceph_x_decrypt(old_key, p, end, ticket_buf,
 -   TEMP_TICKET_BUF_LEN);
 + dlen = ceph_x_decrypt(old_key, p, end, ticket_buf, 0);
   if (dlen  0) {
   ret = dlen;
   goto out;
   }
 + tp = ticket_buf;
   dlen = ceph_decode_32(tp);
   } else {
   /* unencrypted */
   ceph_decode_32_safe(p, end, dlen, bad);
 + ticket_buf = kmalloc(dlen, GFP_NOFS);
 + if (!ticket_buf) {
 + ret = -ENOMEM;
 + goto out;
 + }
 + tp = ticket_buf;
   ceph_decode_need(p, end, dlen, bad);
   ceph_decode_copy(p, ticket_buf, dlen);
   }
 @@ -237,6 +246,8 @@ static int process_one_ticket(struct ceph_auth_client *ac,
   xi-have_keys |= th-service;
  
  out:
 + kfree(ticket_buf);
 + kfree(dbuf);
   return ret;
  
  bad:
 @@ -249,21 +260,10 @@ static int ceph_x_proc_ticket_reply(struct 
 ceph_auth_client *ac,
   void *buf, void *end)
  {
   void *p = buf;
 - char *dbuf;
 - char *ticket_buf;
   u8 reply_struct_v;
   u32 num;
   int ret;
  
 - dbuf = kmalloc(TEMP_TICKET_BUF_LEN, GFP_NOFS);
 - if (!dbuf)
 -   

[ANN] ceph-deploy 1.5.14 released

2014-09-10 Thread Alfredo Deza
Hi All,

There is a new bug-fix release of ceph-deploy that helps prevent the
environment variable
issues that sometimes may cause issues when depending on them on remote hosts.

It is also now possible to specify public and cluster networks when
creating a new ceph.conf file.

Make sure you update!


-Alfredo
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 2/5] ceph: Convert pr_warning to pr_warn

2014-09-10 Thread Joe Perches
On Wed, 2014-09-10 at 16:38 +0400, Ilya Dryomov wrote:
 On Wed, Sep 10, 2014 at 8:17 AM, Joe Perches j...@perches.com wrote:
  Use the more common pr_warn.
 
  Other miscellanea:
 
  o Coalesce formats
  o Realign arguments
[]
  diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
[]
  @@ -2828,8 +2828,9 @@ static void con_work(struct work_struct *work)
*/
   static void con_fault(struct ceph_connection *con)
   {
  -   pr_warning(%s%lld %s %s\n, ENTITY_NAME(con-peer_name),
  -  ceph_pr_addr(con-peer_addr.in_addr), con-error_msg);
  +   pr_warn(%s%lld %s %s\n,
  +   ENTITY_NAME(con-peer_name),
  +   ceph_pr_addr(con-peer_addr.in_addr), con-error_msg);
 
 Just wondering, why did you move ENITITY_NAME argument to the next
 line?  I see you are doing this pretty consistently.  Is this a style
 guideline or something?

Hello Ilya.

It's just a personal preference to have the
formats on one line and its arguments on
another when the format and its arguments
don't fit 80 columns.

Maximal 80 column fill isn't always easy for me
to read.

I find it easier to match formats and arguments
when scanning because my eye doesn't have to
scan to the end of the format line.

If you don't like it, it's your code.  Change it.

cheers, Joe

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 2/5] ceph: Convert pr_warning to pr_warn

2014-09-10 Thread Ilya Dryomov
On Wed, Sep 10, 2014 at 4:30 PM, Ilya Dryomov ilya.dryo...@inktank.com wrote:
 On Wed, Sep 10, 2014 at 8:17 AM, Joe Perches j...@perches.com wrote:
 Use the more common pr_warn.

 Other miscellanea:

 o Coalesce formats
 o Realign arguments

 Signed-off-by: Joe Perches j...@perches.com
 ---
  net/ceph/ceph_common.c | 15 +--
  net/ceph/messenger.c   | 19 ++-
  net/ceph/osd_client.c  | 15 +++
  net/ceph/osdmap.c  | 20 ++--
  4 files changed, 36 insertions(+), 33 deletions(-)

 diff --git a/net/ceph/ceph_common.c b/net/ceph/ceph_common.c
 index 1675021..58fbfe1 100644
 --- a/net/ceph/ceph_common.c
 +++ b/net/ceph/ceph_common.c
 @@ -293,17 +293,20 @@ static int get_secret(struct ceph_crypto_key *dst, 
 const char *name) {
 key_err = PTR_ERR(ukey);
 switch (key_err) {
 case -ENOKEY:
 -   pr_warning(ceph: Mount failed due to key not found: 
 %s\n, name);
 +   pr_warn(ceph: Mount failed due to key not found: 
 %s\n,
 +   name);
 break;
 case -EKEYEXPIRED:
 -   pr_warning(ceph: Mount failed due to expired key: 
 %s\n, name);
 +   pr_warn(ceph: Mount failed due to expired key: 
 %s\n,
 +   name);
 break;
 case -EKEYREVOKED:
 -   pr_warning(ceph: Mount failed due to revoked key: 
 %s\n, name);
 +   pr_warn(ceph: Mount failed due to revoked key: 
 %s\n,
 +   name);
 break;
 default:
 -   pr_warning(ceph: Mount failed due to unknown key 
 error
 -   %d: %s\n, key_err, name);
 +   pr_warn(ceph: Mount failed due to unknown key error 
 %d: %s\n,
 +   key_err, name);
 }
 err = -EPERM;
 goto out;
 @@ -433,7 +436,7 @@ ceph_parse_options(char *options, const char *dev_name,

 /* misc */
 case Opt_osdtimeout:
 -   pr_warning(ignoring deprecated osdtimeout 
 option\n);
 +   pr_warn(ignoring deprecated osdtimeout option\n);
 break;
 case Opt_osdkeepalivetimeout:
 opt-osd_keepalive_timeout = intval;
 diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
 index b2f571d..a7203be 100644
 --- a/net/ceph/messenger.c
 +++ b/net/ceph/messenger.c
 @@ -1937,11 +1937,11 @@ static int process_banner(struct ceph_connection 
 *con)
sizeof(con-peer_addr)) != 0 
 !(addr_is_blank(con-actual_peer_addr.in_addr) 
   con-actual_peer_addr.nonce == con-peer_addr.nonce)) {
 -   pr_warning(wrong peer, want %s/%d, got %s/%d\n,
 -  ceph_pr_addr(con-peer_addr.in_addr),
 -  (int)le32_to_cpu(con-peer_addr.nonce),
 -  ceph_pr_addr(con-actual_peer_addr.in_addr),
 -  (int)le32_to_cpu(con-actual_peer_addr.nonce));
 +   pr_warn(wrong peer, want %s/%d, got %s/%d\n,
 +   ceph_pr_addr(con-peer_addr.in_addr),
 +   (int)le32_to_cpu(con-peer_addr.nonce),
 +   ceph_pr_addr(con-actual_peer_addr.in_addr),
 +   (int)le32_to_cpu(con-actual_peer_addr.nonce));
 con-error_msg = wrong peer at address;
 return -1;
 }
 @@ -2302,7 +2302,7 @@ static int read_partial_message(struct ceph_connection 
 *con)

 BUG_ON(!con-in_msg ^ skip);
 if (con-in_msg  data_len  con-in_msg-data_length) {
 -   pr_warning(%s skipping long message (%u  %zd)\n,
 +   pr_warn(%s skipping long message (%u  %zd)\n,
 __func__, data_len, 
 con-in_msg-data_length);
 ceph_msg_put(con-in_msg);
 con-in_msg = NULL;
 @@ -2712,7 +2712,7 @@ static bool con_sock_closed(struct ceph_connection 
 *con)
 CASE(OPEN);
 CASE(STANDBY);
 default:
 -   pr_warning(%s con %p unrecognized state %lu\n,
 +   pr_warn(%s con %p unrecognized state %lu\n,
 __func__, con, con-state);
 con-error_msg = unrecognized con state;
 BUG();
 @@ -2828,8 +2828,9 @@ static void con_work(struct work_struct *work)
   */
  static void con_fault(struct ceph_connection *con)
  {
 -   pr_warning(%s%lld %s %s\n, ENTITY_NAME(con-peer_name),
 -  ceph_pr_addr(con-peer_addr.in_addr), con-error_msg);
 +   pr_warn(%s%lld %s %s\n,
 +   ENTITY_NAME(con-peer_name),
 +   

Re: [ceph-users] [ANN] ceph-deploy 1.5.14 released

2014-09-10 Thread Scottix
Suggestion:
Can you link to a changelog of any new features or major bug fixes
when you do new releases.

Thanks,
Scottix

On Wed, Sep 10, 2014 at 6:45 AM, Alfredo Deza alfredo.d...@inktank.com wrote:
 Hi All,

 There is a new bug-fix release of ceph-deploy that helps prevent the
 environment variable
 issues that sometimes may cause issues when depending on them on remote hosts.

 It is also now possible to specify public and cluster networks when
 creating a new ceph.conf file.

 Make sure you update!


 -Alfredo
 ___
 ceph-users mailing list
 ceph-us...@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Follow Me: @Scottix
http://about.me/scottix
scot...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OpTracker optimization

2014-09-10 Thread Samuel Just
Added a comment about the approach.
-Sam

On Tue, Sep 9, 2014 at 1:33 PM, Somnath Roy somnath@sandisk.com wrote:
 Hi Sam/Sage,

 As we discussed earlier, enabling the present OpTracker code degrading
 performance severely. For example, in my setup a single OSD node with 10
 clients is reaching ~103K read iops with io served from memory while
 optracking is disabled but enabling optracker it is reduced to ~39K iops.
 Probably, running OSD without enabling OpTracker is not an option for many
 of Ceph users.

 Now, by sharding the Optracker:: ops_in_flight_lock (thus xlist
 ops_in_flight) and removing some other bottlenecks I am able to match the
 performance of OpTracking enabled OSD with OpTracking disabled, but with the
 expense of ~1 extra cpu core.

 In this process I have also fixed the following tracker.



 http://tracker.ceph.com/issues/9384



 and probably http://tracker.ceph.com/issues/8885 too.



 I have created following pull request for the same. Please review it.



 https://github.com/ceph/ceph/pull/2440



 Thanks  Regards

 Somnath




 

 PLEASE NOTE: The information contained in this electronic mail message is
 intended only for the use of the designated recipient(s) named above. If the
 reader of this message is not the intended recipient, you are hereby
 notified that you have received this message in error and that any review,
 dissemination, distribution, or copying of this message is strictly
 prohibited. If you have received this communication in error, please notify
 the sender by telephone or e-mail (as shown above) immediately and destroy
 any and all copies of this message in your possession (whether hard copies
 or electronically stored copies).

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/5] net: Convert pr_warning to pr_warn

2014-09-10 Thread David Miller
From: Joe Perches j...@perches.com
Date: Tue,  9 Sep 2014 21:17:27 -0700

 Remove remaining uses of pr_warning in net/
 
 Joe Perches (5):
   atm: Convert pr_warning to pr_warn
   ceph: Convert pr_warning to pr_warn
   pktgen: Convert pr_warning to pr_warn
   iucv: Convert pr_warning to pr_warn
   netfilter: Convert pr_warning to pr_warn

All except the ceph patch which is going via another tree.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: OpTracker optimization

2014-09-10 Thread Somnath Roy
Thanks Sam..I responded back :-)

-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Samuel Just
Sent: Wednesday, September 10, 2014 11:17 AM
To: Somnath Roy
Cc: Sage Weil (sw...@redhat.com); ceph-devel@vger.kernel.org; 
ceph-us...@lists.ceph.com
Subject: Re: OpTracker optimization

Added a comment about the approach.
-Sam

On Tue, Sep 9, 2014 at 1:33 PM, Somnath Roy somnath@sandisk.com wrote:
 Hi Sam/Sage,

 As we discussed earlier, enabling the present OpTracker code degrading
 performance severely. For example, in my setup a single OSD node with
 10 clients is reaching ~103K read iops with io served from memory
 while optracking is disabled but enabling optracker it is reduced to ~39K 
 iops.
 Probably, running OSD without enabling OpTracker is not an option for
 many of Ceph users.

 Now, by sharding the Optracker:: ops_in_flight_lock (thus xlist
 ops_in_flight) and removing some other bottlenecks I am able to match
 the performance of OpTracking enabled OSD with OpTracking disabled,
 but with the expense of ~1 extra cpu core.

 In this process I have also fixed the following tracker.



 http://tracker.ceph.com/issues/9384



 and probably http://tracker.ceph.com/issues/8885 too.



 I have created following pull request for the same. Please review it.



 https://github.com/ceph/ceph/pull/2440



 Thanks  Regards

 Somnath




 

 PLEASE NOTE: The information contained in this electronic mail message
 is intended only for the use of the designated recipient(s) named
 above. If the reader of this message is not the intended recipient,
 you are hereby notified that you have received this message in error
 and that any review, dissemination, distribution, or copying of this
 message is strictly prohibited. If you have received this
 communication in error, please notify the sender by telephone or
 e-mail (as shown above) immediately and destroy any and all copies of
 this message in your possession (whether hard copies or electronically stored 
 copies).

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).



Re: OpTracker optimization

2014-09-10 Thread Samuel Just
Responded with cosmetic nonsense.  Once you've got that and the other
comments addressed, I can put it in wip-sam-testing.
-Sam

On Wed, Sep 10, 2014 at 1:30 PM, Somnath Roy somnath@sandisk.com wrote:
 Thanks Sam..I responded back :-)

 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org 
 [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Samuel Just
 Sent: Wednesday, September 10, 2014 11:17 AM
 To: Somnath Roy
 Cc: Sage Weil (sw...@redhat.com); ceph-devel@vger.kernel.org; 
 ceph-us...@lists.ceph.com
 Subject: Re: OpTracker optimization

 Added a comment about the approach.
 -Sam

 On Tue, Sep 9, 2014 at 1:33 PM, Somnath Roy somnath@sandisk.com wrote:
 Hi Sam/Sage,

 As we discussed earlier, enabling the present OpTracker code degrading
 performance severely. For example, in my setup a single OSD node with
 10 clients is reaching ~103K read iops with io served from memory
 while optracking is disabled but enabling optracker it is reduced to ~39K 
 iops.
 Probably, running OSD without enabling OpTracker is not an option for
 many of Ceph users.

 Now, by sharding the Optracker:: ops_in_flight_lock (thus xlist
 ops_in_flight) and removing some other bottlenecks I am able to match
 the performance of OpTracking enabled OSD with OpTracking disabled,
 but with the expense of ~1 extra cpu core.

 In this process I have also fixed the following tracker.



 http://tracker.ceph.com/issues/9384



 and probably http://tracker.ceph.com/issues/8885 too.



 I have created following pull request for the same. Please review it.



 https://github.com/ceph/ceph/pull/2440



 Thanks  Regards

 Somnath




 

 PLEASE NOTE: The information contained in this electronic mail message
 is intended only for the use of the designated recipient(s) named
 above. If the reader of this message is not the intended recipient,
 you are hereby notified that you have received this message in error
 and that any review, dissemination, distribution, or copying of this
 message is strictly prohibited. If you have received this
 communication in error, please notify the sender by telephone or
 e-mail (as shown above) immediately and destroy any and all copies of
 this message in your possession (whether hard copies or electronically 
 stored copies).

 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in the 
 body of a message to majord...@vger.kernel.org More majordomo info at  
 http://vger.kernel.org/majordomo-info.html

 

 PLEASE NOTE: The information contained in this electronic mail message is 
 intended only for the use of the designated recipient(s) named above. If the 
 reader of this message is not the intended recipient, you are hereby notified 
 that you have received this message in error and that any review, 
 dissemination, distribution, or copying of this message is strictly 
 prohibited. If you have received this communication in error, please notify 
 the sender by telephone or e-mail (as shown above) immediately and destroy 
 any and all copies of this message in your possession (whether hard copies or 
 electronically stored copies).

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: OpTracker optimization

2014-09-10 Thread Somnath Roy
Thanks Sam.
So, you want me to go with optracker/shadedopWq , right ?

Regards
Somnath

-Original Message-
From: Samuel Just [mailto:sam.j...@inktank.com] 
Sent: Wednesday, September 10, 2014 2:36 PM
To: Somnath Roy
Cc: Sage Weil (sw...@redhat.com); ceph-devel@vger.kernel.org; 
ceph-us...@lists.ceph.com
Subject: Re: OpTracker optimization

Responded with cosmetic nonsense.  Once you've got that and the other comments 
addressed, I can put it in wip-sam-testing.
-Sam

On Wed, Sep 10, 2014 at 1:30 PM, Somnath Roy somnath@sandisk.com wrote:
 Thanks Sam..I responded back :-)

 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org 
 [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Samuel Just
 Sent: Wednesday, September 10, 2014 11:17 AM
 To: Somnath Roy
 Cc: Sage Weil (sw...@redhat.com); ceph-devel@vger.kernel.org; 
 ceph-us...@lists.ceph.com
 Subject: Re: OpTracker optimization

 Added a comment about the approach.
 -Sam

 On Tue, Sep 9, 2014 at 1:33 PM, Somnath Roy somnath@sandisk.com wrote:
 Hi Sam/Sage,

 As we discussed earlier, enabling the present OpTracker code 
 degrading performance severely. For example, in my setup a single OSD 
 node with
 10 clients is reaching ~103K read iops with io served from memory 
 while optracking is disabled but enabling optracker it is reduced to ~39K 
 iops.
 Probably, running OSD without enabling OpTracker is not an option for 
 many of Ceph users.

 Now, by sharding the Optracker:: ops_in_flight_lock (thus xlist
 ops_in_flight) and removing some other bottlenecks I am able to match 
 the performance of OpTracking enabled OSD with OpTracking disabled, 
 but with the expense of ~1 extra cpu core.

 In this process I have also fixed the following tracker.



 http://tracker.ceph.com/issues/9384



 and probably http://tracker.ceph.com/issues/8885 too.



 I have created following pull request for the same. Please review it.



 https://github.com/ceph/ceph/pull/2440



 Thanks  Regards

 Somnath




 

 PLEASE NOTE: The information contained in this electronic mail 
 message is intended only for the use of the designated recipient(s) 
 named above. If the reader of this message is not the intended 
 recipient, you are hereby notified that you have received this 
 message in error and that any review, dissemination, distribution, or 
 copying of this message is strictly prohibited. If you have received 
 this communication in error, please notify the sender by telephone or 
 e-mail (as shown above) immediately and destroy any and all copies of 
 this message in your possession (whether hard copies or electronically 
 stored copies).

 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel 
 in the body of a message to majord...@vger.kernel.org More majordomo 
 info at  http://vger.kernel.org/majordomo-info.html

 

 PLEASE NOTE: The information contained in this electronic mail message is 
 intended only for the use of the designated recipient(s) named above. If the 
 reader of this message is not the intended recipient, you are hereby notified 
 that you have received this message in error and that any review, 
 dissemination, distribution, or copying of this message is strictly 
 prohibited. If you have received this communication in error, please notify 
 the sender by telephone or e-mail (as shown above) immediately and destroy 
 any and all copies of this message in your possession (whether hard copies or 
 electronically stored copies).



Re: OpTracker optimization

2014-09-10 Thread Samuel Just
I don't quite understand.
-Sam

On Wed, Sep 10, 2014 at 2:38 PM, Somnath Roy somnath@sandisk.com wrote:
 Thanks Sam.
 So, you want me to go with optracker/shadedopWq , right ?

 Regards
 Somnath

 -Original Message-
 From: Samuel Just [mailto:sam.j...@inktank.com]
 Sent: Wednesday, September 10, 2014 2:36 PM
 To: Somnath Roy
 Cc: Sage Weil (sw...@redhat.com); ceph-devel@vger.kernel.org; 
 ceph-us...@lists.ceph.com
 Subject: Re: OpTracker optimization

 Responded with cosmetic nonsense.  Once you've got that and the other 
 comments addressed, I can put it in wip-sam-testing.
 -Sam

 On Wed, Sep 10, 2014 at 1:30 PM, Somnath Roy somnath@sandisk.com wrote:
 Thanks Sam..I responded back :-)

 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org
 [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Samuel Just
 Sent: Wednesday, September 10, 2014 11:17 AM
 To: Somnath Roy
 Cc: Sage Weil (sw...@redhat.com); ceph-devel@vger.kernel.org;
 ceph-us...@lists.ceph.com
 Subject: Re: OpTracker optimization

 Added a comment about the approach.
 -Sam

 On Tue, Sep 9, 2014 at 1:33 PM, Somnath Roy somnath@sandisk.com wrote:
 Hi Sam/Sage,

 As we discussed earlier, enabling the present OpTracker code
 degrading performance severely. For example, in my setup a single OSD
 node with
 10 clients is reaching ~103K read iops with io served from memory
 while optracking is disabled but enabling optracker it is reduced to ~39K 
 iops.
 Probably, running OSD without enabling OpTracker is not an option for
 many of Ceph users.

 Now, by sharding the Optracker:: ops_in_flight_lock (thus xlist
 ops_in_flight) and removing some other bottlenecks I am able to match
 the performance of OpTracking enabled OSD with OpTracking disabled,
 but with the expense of ~1 extra cpu core.

 In this process I have also fixed the following tracker.



 http://tracker.ceph.com/issues/9384



 and probably http://tracker.ceph.com/issues/8885 too.



 I have created following pull request for the same. Please review it.



 https://github.com/ceph/ceph/pull/2440



 Thanks  Regards

 Somnath




 

 PLEASE NOTE: The information contained in this electronic mail
 message is intended only for the use of the designated recipient(s)
 named above. If the reader of this message is not the intended
 recipient, you are hereby notified that you have received this
 message in error and that any review, dissemination, distribution, or
 copying of this message is strictly prohibited. If you have received
 this communication in error, please notify the sender by telephone or
 e-mail (as shown above) immediately and destroy any and all copies of
 this message in your possession (whether hard copies or electronically 
 stored copies).

 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel
 in the body of a message to majord...@vger.kernel.org More majordomo
 info at  http://vger.kernel.org/majordomo-info.html

 

 PLEASE NOTE: The information contained in this electronic mail message is 
 intended only for the use of the designated recipient(s) named above. If the 
 reader of this message is not the intended recipient, you are hereby 
 notified that you have received this message in error and that any review, 
 dissemination, distribution, or copying of this message is strictly 
 prohibited. If you have received this communication in error, please notify 
 the sender by telephone or e-mail (as shown above) immediately and destroy 
 any and all copies of this message in your possession (whether hard copies 
 or electronically stored copies).

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: OpTracker optimization

2014-09-10 Thread Somnath Roy
As I understand, you want me to implement the following.

1.  Keep this implementation one sharded optracker for the ios going through 
ms_dispatch path.

2. Additionally, for ios going through ms_fast_dispatch, you want me to 
implement optracker (without internal shard) per opwq shard 

Am I right ?

Thanks  Regards
Somnath

-Original Message-
From: Samuel Just [mailto:sam.j...@inktank.com] 
Sent: Wednesday, September 10, 2014 3:08 PM
To: Somnath Roy
Cc: Sage Weil (sw...@redhat.com); ceph-devel@vger.kernel.org; 
ceph-us...@lists.ceph.com
Subject: Re: OpTracker optimization

I don't quite understand.
-Sam

On Wed, Sep 10, 2014 at 2:38 PM, Somnath Roy somnath@sandisk.com wrote:
 Thanks Sam.
 So, you want me to go with optracker/shadedopWq , right ?

 Regards
 Somnath

 -Original Message-
 From: Samuel Just [mailto:sam.j...@inktank.com]
 Sent: Wednesday, September 10, 2014 2:36 PM
 To: Somnath Roy
 Cc: Sage Weil (sw...@redhat.com); ceph-devel@vger.kernel.org; 
 ceph-us...@lists.ceph.com
 Subject: Re: OpTracker optimization

 Responded with cosmetic nonsense.  Once you've got that and the other 
 comments addressed, I can put it in wip-sam-testing.
 -Sam

 On Wed, Sep 10, 2014 at 1:30 PM, Somnath Roy somnath@sandisk.com wrote:
 Thanks Sam..I responded back :-)

 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org 
 [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Samuel Just
 Sent: Wednesday, September 10, 2014 11:17 AM
 To: Somnath Roy
 Cc: Sage Weil (sw...@redhat.com); ceph-devel@vger.kernel.org; 
 ceph-us...@lists.ceph.com
 Subject: Re: OpTracker optimization

 Added a comment about the approach.
 -Sam

 On Tue, Sep 9, 2014 at 1:33 PM, Somnath Roy somnath@sandisk.com wrote:
 Hi Sam/Sage,

 As we discussed earlier, enabling the present OpTracker code 
 degrading performance severely. For example, in my setup a single 
 OSD node with
 10 clients is reaching ~103K read iops with io served from memory 
 while optracking is disabled but enabling optracker it is reduced to ~39K 
 iops.
 Probably, running OSD without enabling OpTracker is not an option 
 for many of Ceph users.

 Now, by sharding the Optracker:: ops_in_flight_lock (thus xlist
 ops_in_flight) and removing some other bottlenecks I am able to 
 match the performance of OpTracking enabled OSD with OpTracking 
 disabled, but with the expense of ~1 extra cpu core.

 In this process I have also fixed the following tracker.



 http://tracker.ceph.com/issues/9384



 and probably http://tracker.ceph.com/issues/8885 too.



 I have created following pull request for the same. Please review it.



 https://github.com/ceph/ceph/pull/2440



 Thanks  Regards

 Somnath




 

 PLEASE NOTE: The information contained in this electronic mail 
 message is intended only for the use of the designated recipient(s) 
 named above. If the reader of this message is not the intended 
 recipient, you are hereby notified that you have received this 
 message in error and that any review, dissemination, distribution, 
 or copying of this message is strictly prohibited. If you have 
 received this communication in error, please notify the sender by 
 telephone or e-mail (as shown above) immediately and destroy any and 
 all copies of this message in your possession (whether hard copies or 
 electronically stored copies).

 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel
 in the body of a message to majord...@vger.kernel.org More majordomo 
 info at  http://vger.kernel.org/majordomo-info.html

 

 PLEASE NOTE: The information contained in this electronic mail message is 
 intended only for the use of the designated recipient(s) named above. If the 
 reader of this message is not the intended recipient, you are hereby 
 notified that you have received this message in error and that any review, 
 dissemination, distribution, or copying of this message is strictly 
 prohibited. If you have received this communication in error, please notify 
 the sender by telephone or e-mail (as shown above) immediately and destroy 
 any and all copies of this message in your possession (whether hard copies 
 or electronically stored copies).



Re: OpTracker optimization

2014-09-10 Thread Samuel Just
Oh, I changed my mind, your approach is fine.  I was unclear.
Currently, I just need you to address the other comments.
-Sam

On Wed, Sep 10, 2014 at 3:13 PM, Somnath Roy somnath@sandisk.com wrote:
 As I understand, you want me to implement the following.

 1.  Keep this implementation one sharded optracker for the ios going through 
 ms_dispatch path.

 2. Additionally, for ios going through ms_fast_dispatch, you want me to 
 implement optracker (without internal shard) per opwq shard

 Am I right ?

 Thanks  Regards
 Somnath

 -Original Message-
 From: Samuel Just [mailto:sam.j...@inktank.com]
 Sent: Wednesday, September 10, 2014 3:08 PM
 To: Somnath Roy
 Cc: Sage Weil (sw...@redhat.com); ceph-devel@vger.kernel.org; 
 ceph-us...@lists.ceph.com
 Subject: Re: OpTracker optimization

 I don't quite understand.
 -Sam

 On Wed, Sep 10, 2014 at 2:38 PM, Somnath Roy somnath@sandisk.com wrote:
 Thanks Sam.
 So, you want me to go with optracker/shadedopWq , right ?

 Regards
 Somnath

 -Original Message-
 From: Samuel Just [mailto:sam.j...@inktank.com]
 Sent: Wednesday, September 10, 2014 2:36 PM
 To: Somnath Roy
 Cc: Sage Weil (sw...@redhat.com); ceph-devel@vger.kernel.org;
 ceph-us...@lists.ceph.com
 Subject: Re: OpTracker optimization

 Responded with cosmetic nonsense.  Once you've got that and the other 
 comments addressed, I can put it in wip-sam-testing.
 -Sam

 On Wed, Sep 10, 2014 at 1:30 PM, Somnath Roy somnath@sandisk.com wrote:
 Thanks Sam..I responded back :-)

 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org
 [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Samuel Just
 Sent: Wednesday, September 10, 2014 11:17 AM
 To: Somnath Roy
 Cc: Sage Weil (sw...@redhat.com); ceph-devel@vger.kernel.org;
 ceph-us...@lists.ceph.com
 Subject: Re: OpTracker optimization

 Added a comment about the approach.
 -Sam

 On Tue, Sep 9, 2014 at 1:33 PM, Somnath Roy somnath@sandisk.com wrote:
 Hi Sam/Sage,

 As we discussed earlier, enabling the present OpTracker code
 degrading performance severely. For example, in my setup a single
 OSD node with
 10 clients is reaching ~103K read iops with io served from memory
 while optracking is disabled but enabling optracker it is reduced to ~39K 
 iops.
 Probably, running OSD without enabling OpTracker is not an option
 for many of Ceph users.

 Now, by sharding the Optracker:: ops_in_flight_lock (thus xlist
 ops_in_flight) and removing some other bottlenecks I am able to
 match the performance of OpTracking enabled OSD with OpTracking
 disabled, but with the expense of ~1 extra cpu core.

 In this process I have also fixed the following tracker.



 http://tracker.ceph.com/issues/9384



 and probably http://tracker.ceph.com/issues/8885 too.



 I have created following pull request for the same. Please review it.



 https://github.com/ceph/ceph/pull/2440



 Thanks  Regards

 Somnath




 

 PLEASE NOTE: The information contained in this electronic mail
 message is intended only for the use of the designated recipient(s)
 named above. If the reader of this message is not the intended
 recipient, you are hereby notified that you have received this
 message in error and that any review, dissemination, distribution,
 or copying of this message is strictly prohibited. If you have
 received this communication in error, please notify the sender by
 telephone or e-mail (as shown above) immediately and destroy any and
 all copies of this message in your possession (whether hard copies or 
 electronically stored copies).

 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel
 in the body of a message to majord...@vger.kernel.org More majordomo
 info at  http://vger.kernel.org/majordomo-info.html

 

 PLEASE NOTE: The information contained in this electronic mail message is 
 intended only for the use of the designated recipient(s) named above. If 
 the reader of this message is not the intended recipient, you are hereby 
 notified that you have received this message in error and that any review, 
 dissemination, distribution, or copying of this message is strictly 
 prohibited. If you have received this communication in error, please notify 
 the sender by telephone or e-mail (as shown above) immediately and destroy 
 any and all copies of this message in your possession (whether hard copies 
 or electronically stored copies).

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RBD readahead strategies

2014-09-10 Thread Adam Crume
I've been testing a few strategies for RBD readahead and wanted to
share my results as well as ask for input.

I have four sample workloads that I replayed at maximum speed with
rbd-replay.  boot-ide and boot-virtio are captured from booting a VM
with the image on the IDE and virtio buses, respectively.  Likewise,
grep-ide and grep-virtio are captured from a large grep run.  (I'm not
entirely sure why the IDE and virtio workloads are different, but part
of it is the number of pending requests allowed.)

The readahead strategies are:
- none: No readahead.
- plain: My initial implementation.  The readahead window doubles for
each readahead request, up to a limit, and resets when a random
request is detected.
- aligned: Same as above, but readahead requests are aligned with
object boundaries, when possible.
- eager: When activated, read to the end of the object.

For all of these, 10 sequential requests trigger readahead, the
maximum readahead size is 4 MB, and rbd readahead disable after
bytes is disabled (meaning that readahead is enabled for the entire
workload).  The object size is the default 4 MB, and data is striped
over a single object.  (Alignment with stripes or object sets is
ignored for now.)

Here's the data:

workload  strategy   time (seconds)   RA ops   RA MB   read ops   read MB
boot-ide  none   46.22 +/- 0.410   0  57516   407
boot-ide  plain  11.42 +/- 0.25  281 203  57516   407
boot-ide  aligned11.46 +/- 0.13  276 201  57516   407
boot-ide  eager  12.48 +/- 0.61  111 303  57516   407
boot-virtio   none9.05 +/- 0.250   0  11851   393
boot-virtio   plain   8.05 +/- 0.38  451 221  11851   393
boot-virtio   aligned 7.86 +/- 0.27  452 213  11851   393
boot-virtio   eager   9.17 +/- 0.34  249 600  11851   393
grep-ide  none  138.55 +/- 1.670   0 130104  3044
grep-ide  plain 136.07 +/- 1.57  397 867 130104  3044
grep-ide  aligned   137.30 +/- 1.77  379 844 130104  3044
grep-ide  eager 138.77 +/- 1.52  346 993 130104  3044
grep-virtio   none  120.73 +/- 1.330   0 130061  2820
grep-virtio   plain 121.29 +/- 1.28 11861485 130061  2820
grep-virtio   aligned   123.32 +/- 1.29 11391409 130061  2820
grep-virtio   eager 127.75 +/- 1.32  8422218 130061  2820

(The time is the mean wall-clock time +/- the margin of error with
99.7% confidence.  RA=readahead.)

Right off the bat, readahead is a huge improvement for the boot-ide
workload, which is no surprise because it issues 50,000 sequential,
single-sector reads.  (Why the early boot process is so inefficient is
open for speculation, but that's a real, natural workload.)
boot-virtio also sees an improvement, although not nearly so dramatic.
The grep workloads show no statistically significant improvement.

One conclusion I draw is that 'eager' is, well, too eager.  'aligned'
shows no statistically significant difference from 'plain', and
'plain' is no worse than 'none' (at statistically significant levels)
and sometimes better.

Should the readahead strategy be configurable, or should we just stick
with whichever seems the best one?  Is there anything big I'm missing?

Adam
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] libceph: fix a memory leak in handle_watch_notify

2014-09-10 Thread roy . qing . li
From: Li RongQing roy.qing...@gmail.com 

event_work should be freed when adding it to queue failed

Signed-off-by: Li RongQing roy.qing...@gmail.com
---
 net/ceph/osd_client.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 30f6faf..1e1b4f1 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -2323,6 +2323,7 @@ static void handle_watch_notify(struct ceph_osd_client 
*osdc,
event_work-opcode = opcode;
if (!queue_work(osdc-notify_wq, event_work-work)) {
dout(WARNING: failed to queue notify event work\n);
+   kfree(event_work);
goto done_err;
}
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libceph: fix a memory leak in handle_watch_notify

2014-09-10 Thread Alex Elder

On 09/10/2014 07:20 PM, roy.qing...@gmail.com wrote:

From: Li RongQing roy.qing...@gmail.com

event_work should be freed when adding it to queue failed

Signed-off-by: Li RongQing roy.qing...@gmail.com


Looks good.

Reviewed-by: Alex Elder el...@linaro.org


---
  net/ceph/osd_client.c |1 +
  1 file changed, 1 insertion(+)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 30f6faf..1e1b4f1 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -2323,6 +2323,7 @@ static void handle_watch_notify(struct ceph_osd_client 
*osdc,
event_work-opcode = opcode;
if (!queue_work(osdc-notify_wq, event_work-work)) {
dout(WARNING: failed to queue notify event work\n);
+   kfree(event_work);
goto done_err;
}
}



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: OpTracker optimization

2014-09-10 Thread Somnath Roy
Sam/Sage,
I have incorporated all of your comments. Please have a look at the same pull 
request.

https://github.com/ceph/ceph/pull/2440

Thanks  Regards
Somnath

-Original Message-
From: Samuel Just [mailto:sam.j...@inktank.com] 
Sent: Wednesday, September 10, 2014 3:25 PM
To: Somnath Roy
Cc: Sage Weil (sw...@redhat.com); ceph-devel@vger.kernel.org; 
ceph-us...@lists.ceph.com
Subject: Re: OpTracker optimization

Oh, I changed my mind, your approach is fine.  I was unclear.
Currently, I just need you to address the other comments.
-Sam

On Wed, Sep 10, 2014 at 3:13 PM, Somnath Roy somnath@sandisk.com wrote:
 As I understand, you want me to implement the following.

 1.  Keep this implementation one sharded optracker for the ios going through 
 ms_dispatch path.

 2. Additionally, for ios going through ms_fast_dispatch, you want me 
 to implement optracker (without internal shard) per opwq shard

 Am I right ?

 Thanks  Regards
 Somnath

 -Original Message-
 From: Samuel Just [mailto:sam.j...@inktank.com]
 Sent: Wednesday, September 10, 2014 3:08 PM
 To: Somnath Roy
 Cc: Sage Weil (sw...@redhat.com); ceph-devel@vger.kernel.org; 
 ceph-us...@lists.ceph.com
 Subject: Re: OpTracker optimization

 I don't quite understand.
 -Sam

 On Wed, Sep 10, 2014 at 2:38 PM, Somnath Roy somnath@sandisk.com wrote:
 Thanks Sam.
 So, you want me to go with optracker/shadedopWq , right ?

 Regards
 Somnath

 -Original Message-
 From: Samuel Just [mailto:sam.j...@inktank.com]
 Sent: Wednesday, September 10, 2014 2:36 PM
 To: Somnath Roy
 Cc: Sage Weil (sw...@redhat.com); ceph-devel@vger.kernel.org; 
 ceph-us...@lists.ceph.com
 Subject: Re: OpTracker optimization

 Responded with cosmetic nonsense.  Once you've got that and the other 
 comments addressed, I can put it in wip-sam-testing.
 -Sam

 On Wed, Sep 10, 2014 at 1:30 PM, Somnath Roy somnath@sandisk.com wrote:
 Thanks Sam..I responded back :-)

 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org 
 [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Samuel Just
 Sent: Wednesday, September 10, 2014 11:17 AM
 To: Somnath Roy
 Cc: Sage Weil (sw...@redhat.com); ceph-devel@vger.kernel.org; 
 ceph-us...@lists.ceph.com
 Subject: Re: OpTracker optimization

 Added a comment about the approach.
 -Sam

 On Tue, Sep 9, 2014 at 1:33 PM, Somnath Roy somnath@sandisk.com wrote:
 Hi Sam/Sage,

 As we discussed earlier, enabling the present OpTracker code 
 degrading performance severely. For example, in my setup a single 
 OSD node with
 10 clients is reaching ~103K read iops with io served from memory 
 while optracking is disabled but enabling optracker it is reduced to ~39K 
 iops.
 Probably, running OSD without enabling OpTracker is not an option 
 for many of Ceph users.

 Now, by sharding the Optracker:: ops_in_flight_lock (thus xlist
 ops_in_flight) and removing some other bottlenecks I am able to 
 match the performance of OpTracking enabled OSD with OpTracking 
 disabled, but with the expense of ~1 extra cpu core.

 In this process I have also fixed the following tracker.



 http://tracker.ceph.com/issues/9384



 and probably http://tracker.ceph.com/issues/8885 too.



 I have created following pull request for the same. Please review it.



 https://github.com/ceph/ceph/pull/2440



 Thanks  Regards

 Somnath




 

 PLEASE NOTE: The information contained in this electronic mail 
 message is intended only for the use of the designated recipient(s) 
 named above. If the reader of this message is not the intended 
 recipient, you are hereby notified that you have received this 
 message in error and that any review, dissemination, distribution, 
 or copying of this message is strictly prohibited. If you have 
 received this communication in error, please notify the sender by 
 telephone or e-mail (as shown above) immediately and destroy any 
 and all copies of this message in your possession (whether hard copies or 
 electronically stored copies).

 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel
 in the body of a message to majord...@vger.kernel.org More majordomo 
 info at  http://vger.kernel.org/majordomo-info.html

 

 PLEASE NOTE: The information contained in this electronic mail message is 
 intended only for the use of the designated recipient(s) named above. If 
 the reader of this message is not the intended recipient, you are hereby 
 notified that you have received this message in error and that any review, 
 dissemination, distribution, or copying of this message is strictly 
 prohibited. If you have received this communication in error, please notify 
 the sender by telephone or e-mail (as shown above) immediately and destroy 
 any and all copies of this message in your possession (whether hard copies 
 or electronically stored copies).



RE: OpTracker optimization

2014-09-10 Thread Sage Weil
I had two substantiative comments on the first patch and then some trivial 
whitespace nits.Otherwise looks good!

tahnks-
sage

On Thu, 11 Sep 2014, Somnath Roy wrote:

 Sam/Sage,
 I have incorporated all of your comments. Please have a look at the same pull 
 request.
 
 https://github.com/ceph/ceph/pull/2440
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: Samuel Just [mailto:sam.j...@inktank.com] 
 Sent: Wednesday, September 10, 2014 3:25 PM
 To: Somnath Roy
 Cc: Sage Weil (sw...@redhat.com); ceph-devel@vger.kernel.org; 
 ceph-us...@lists.ceph.com
 Subject: Re: OpTracker optimization
 
 Oh, I changed my mind, your approach is fine.  I was unclear.
 Currently, I just need you to address the other comments.
 -Sam
 
 On Wed, Sep 10, 2014 at 3:13 PM, Somnath Roy somnath@sandisk.com wrote:
  As I understand, you want me to implement the following.
 
  1.  Keep this implementation one sharded optracker for the ios going 
  through ms_dispatch path.
 
  2. Additionally, for ios going through ms_fast_dispatch, you want me 
  to implement optracker (without internal shard) per opwq shard
 
  Am I right ?
 
  Thanks  Regards
  Somnath
 
  -Original Message-
  From: Samuel Just [mailto:sam.j...@inktank.com]
  Sent: Wednesday, September 10, 2014 3:08 PM
  To: Somnath Roy
  Cc: Sage Weil (sw...@redhat.com); ceph-devel@vger.kernel.org; 
  ceph-us...@lists.ceph.com
  Subject: Re: OpTracker optimization
 
  I don't quite understand.
  -Sam
 
  On Wed, Sep 10, 2014 at 2:38 PM, Somnath Roy somnath@sandisk.com 
  wrote:
  Thanks Sam.
  So, you want me to go with optracker/shadedopWq , right ?
 
  Regards
  Somnath
 
  -Original Message-
  From: Samuel Just [mailto:sam.j...@inktank.com]
  Sent: Wednesday, September 10, 2014 2:36 PM
  To: Somnath Roy
  Cc: Sage Weil (sw...@redhat.com); ceph-devel@vger.kernel.org; 
  ceph-us...@lists.ceph.com
  Subject: Re: OpTracker optimization
 
  Responded with cosmetic nonsense.  Once you've got that and the other 
  comments addressed, I can put it in wip-sam-testing.
  -Sam
 
  On Wed, Sep 10, 2014 at 1:30 PM, Somnath Roy somnath@sandisk.com 
  wrote:
  Thanks Sam..I responded back :-)
 
  -Original Message-
  From: ceph-devel-ow...@vger.kernel.org 
  [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Samuel Just
  Sent: Wednesday, September 10, 2014 11:17 AM
  To: Somnath Roy
  Cc: Sage Weil (sw...@redhat.com); ceph-devel@vger.kernel.org; 
  ceph-us...@lists.ceph.com
  Subject: Re: OpTracker optimization
 
  Added a comment about the approach.
  -Sam
 
  On Tue, Sep 9, 2014 at 1:33 PM, Somnath Roy somnath@sandisk.com 
  wrote:
  Hi Sam/Sage,
 
  As we discussed earlier, enabling the present OpTracker code 
  degrading performance severely. For example, in my setup a single 
  OSD node with
  10 clients is reaching ~103K read iops with io served from memory 
  while optracking is disabled but enabling optracker it is reduced to 
  ~39K iops.
  Probably, running OSD without enabling OpTracker is not an option 
  for many of Ceph users.
 
  Now, by sharding the Optracker:: ops_in_flight_lock (thus xlist
  ops_in_flight) and removing some other bottlenecks I am able to 
  match the performance of OpTracking enabled OSD with OpTracking 
  disabled, but with the expense of ~1 extra cpu core.
 
  In this process I have also fixed the following tracker.
 
 
 
  http://tracker.ceph.com/issues/9384
 
 
 
  and probably http://tracker.ceph.com/issues/8885 too.
 
 
 
  I have created following pull request for the same. Please review it.
 
 
 
  https://github.com/ceph/ceph/pull/2440
 
 
 
  Thanks  Regards
 
  Somnath
 
 
 
 
  
 
  PLEASE NOTE: The information contained in this electronic mail 
  message is intended only for the use of the designated recipient(s) 
  named above. If the reader of this message is not the intended 
  recipient, you are hereby notified that you have received this 
  message in error and that any review, dissemination, distribution, 
  or copying of this message is strictly prohibited. If you have 
  received this communication in error, please notify the sender by 
  telephone or e-mail (as shown above) immediately and destroy any 
  and all copies of this message in your possession (whether hard copies 
  or electronically stored copies).
 
  --
  To unsubscribe from this list: send the line unsubscribe ceph-devel
  in the body of a message to majord...@vger.kernel.org More majordomo 
  info at  http://vger.kernel.org/majordomo-info.html
 
  
 
  PLEASE NOTE: The information contained in this electronic mail message is 
  intended only for the use of the designated recipient(s) named above. If 
  the reader of this message is not the intended recipient, you are hereby 
  notified that you have received this message in error and that any 
  review, dissemination, distribution, or copying of this message is 
  strictly prohibited. If you