Re: how can I achieve HA with ceph?
Hi, back from holiday. I did a successful power unplug test now, but the FS was unavailable for 16 minutes which is clearly wrong... I have the log files but the MDS log is 1.2 gigabyte, if you let me know which lines to filter / filter out I will upload it somewhere... -- Karoly Horvath On Fri, Dec 23, 2011 at 12:00 AM, Gregory Farnum gregory.far...@dreamhost.com wrote: On Wed, Dec 21, 2011 at 8:43 AM, Karoly Horvath rhsw...@gmail.com wrote: On Wed, Dec 21, 2011 at 4:13 PM, Gregory Farnum By client I assume you mean the kernel driver.. the FS is freezed, so I cannot unmount (cannot even `shutdown`).. how can I force the client to reconnect? Try a lazy force unmount: umount -lf ceph_mnt_point/ And then mount again. wow, never heard about this, thanks.:) will report with the next mail In the meantime I did one test, killing mds+osd+mon on beta, it's jammed in '{0=alpha=up:replay}', after 45 minutes I shut it down... I attached the logs. Oh, this is very odd! The MDS goes to sleep while it waits for an up-to-date OSDMap, but it never seems to get woken up even though I see the message sending in the OSDMap. So let's try this one more time, but this time also add in debug objecter = 20 to the MDS config...Those logs will include everything I need, or nothing will, promise! :) -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: questions about radosgw
Hi, On 01/05/2012 12:48 PM, huang jun wrote: hi,all I'am using the s3+radosgw, there are some questions confused me very much: first) An object's size is 400MB, then the whole object will stored in OSDs as one big single object, but not striped into 4MB objects. so how can we got workload banlance if we want to store big objects and small ones? That is correct, RADOS nor the RADOS gateway stripes objects. You will get workload balance if you have a (almost) even spread of workload over your different objects. The dev's might shed some more light on this. second) Can we change the number of pgs in a pool that created by S3 clients or just using rados commands? As far as I know you can't do this through the S3 clients but only with the rados commands. I also would find it rather weird if you could do this as a client. Wido thanks -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ceph: ensure prealloc_blob is in place when removing xattr
In __ceph_build_xattrs_blob(), if a ceph inode's extended attributes are marked dirty, all attributes recorded in its rb_tree index are formatted into a blob buffer. The target buffer is recorded in ceph_inode-i_xattrs.prealloc_blob, and it is expected to exist and be of sufficient size to hold the attributes. The extended attributes are marked dirty in two cases: when a new attribute is added to the inode; or when one is removed. In the former case work is done to ensure the prealloc_blob buffer is properly set up, but in the latter it is not. Change the logic in ceph_removexattr() so it matches what is done in ceph_setxattr(). Note that this is done in a way that keeps the two blocks of code nearly identical, in anticipation of a subsequent patch that encapsulates some of this logic into one or more helper routines. Signed-off-by: Alex Elder el...@dreamhost.com --- fs/ceph/xattr.c | 22 ++ 1 file changed, 22 insertions(+) Index: b/fs/ceph/xattr.c === --- a/fs/ceph/xattr.c +++ b/fs/ceph/xattr.c @@ -818,6 +818,7 @@ int ceph_removexattr(struct dentry *dent struct ceph_vxattr_cb *vxattrs = ceph_inode_vxattrs(inode); int issued; int err; + int required_blob_size; int dirty; if (ceph_snap(inode) != CEPH_NOSNAP) @@ -833,14 +834,34 @@ int ceph_removexattr(struct dentry *dent return -EOPNOTSUPP; } + err = -ENOMEM; spin_lock(ci-i_ceph_lock); __build_xattrs(inode); +retry: issued = __ceph_caps_issued(ci, NULL); dout(removexattr %p issued %s\n, inode, ceph_cap_string(issued)); if (!(issued CEPH_CAP_XATTR_EXCL)) goto do_sync; + required_blob_size = __get_required_blob_size(ci, 0, 0); + + if (!ci-i_xattrs.prealloc_blob || + required_blob_size ci-i_xattrs.prealloc_blob-alloc_len) { + struct ceph_buffer *blob; + + spin_unlock(ci-i_ceph_lock); + dout( preaallocating new blob size=%d\n, required_blob_size); + blob = ceph_buffer_new(required_blob_size, GFP_NOFS); + if (!blob) + goto out; + spin_lock(ci-i_ceph_lock); + if (ci-i_xattrs.prealloc_blob) + ceph_buffer_put(ci-i_xattrs.prealloc_blob); + ci-i_xattrs.prealloc_blob = blob; + goto retry; + } + err = __remove_xattr_by_name(ceph_inode(inode), name); dirty = __ceph_mark_dirty_caps(ci, CEPH_CAP_XATTR_EXCL); ci-i_xattrs.dirty = true; @@ -853,6 +874,7 @@ int ceph_removexattr(struct dentry *dent do_sync: spin_unlock(ci-i_ceph_lock); err = ceph_send_removexattr(dentry, name); +out: return err; } -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] net: add paged frag destructor support to kernel_sendpage.
This requires adding a new argument to various sendpage hooks up and down the stack. At the moment this parameter is always NULL. Signed-off-by: Ian Campbell ian.campb...@citrix.com Cc: David S. Miller da...@davemloft.net Cc: Alexey Kuznetsov kuz...@ms2.inr.ac.ru Cc: Pekka Savola (ipv6) pek...@netcore.fi Cc: James Morris jmor...@namei.org Cc: Hideaki YOSHIFUJI yoshf...@linux-ipv6.org Cc: Patrick McHardy ka...@trash.net Cc: Trond Myklebust trond.mykleb...@netapp.com Cc: Greg Kroah-Hartman gre...@suse.de Cc: drbd-u...@lists.linbit.com Cc: de...@driverdev.osuosl.org Cc: cluster-de...@redhat.com Cc: ocfs2-de...@oss.oracle.com Cc: net...@vger.kernel.org Cc: ceph-devel@vger.kernel.org Cc: rds-de...@oss.oracle.com Cc: linux-...@vger.kernel.org --- drivers/block/drbd/drbd_main.c |1 + drivers/scsi/iscsi_tcp.c |4 ++-- drivers/scsi/iscsi_tcp.h |3 ++- drivers/staging/pohmelfs/trans.c |3 ++- drivers/target/iscsi/iscsi_target_util.c |3 ++- fs/dlm/lowcomms.c|4 ++-- fs/ocfs2/cluster/tcp.c |1 + include/linux/net.h |6 +- include/net/inet_common.h|4 +++- include/net/ip.h |4 +++- include/net/sock.h |8 +--- include/net/tcp.h|4 +++- net/ceph/messenger.c |2 +- net/core/sock.c |6 +- net/ipv4/af_inet.c |9 ++--- net/ipv4/ip_output.c |6 -- net/ipv4/tcp.c | 24 net/ipv4/udp.c | 11 ++- net/ipv4/udp_impl.h |5 +++-- net/rds/tcp_send.c |1 + net/socket.c | 11 +++ net/sunrpc/svcsock.c |6 +++--- net/sunrpc/xprtsock.c|2 +- 23 files changed, 84 insertions(+), 44 deletions(-) diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c index 0358e55..49c7346 100644 --- a/drivers/block/drbd/drbd_main.c +++ b/drivers/block/drbd/drbd_main.c @@ -2584,6 +2584,7 @@ static int _drbd_send_page(struct drbd_conf *mdev, struct page *page, set_fs(KERNEL_DS); do { sent = mdev-data.socket-ops-sendpage(mdev-data.socket, page, + NULL, offset, len, msg_flags); if (sent == -EAGAIN) { diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c index 7c34d8e..3884ae1 100644 --- a/drivers/scsi/iscsi_tcp.c +++ b/drivers/scsi/iscsi_tcp.c @@ -284,8 +284,8 @@ static int iscsi_sw_tcp_xmit_segment(struct iscsi_tcp_conn *tcp_conn, if (!segment-data) { sg = segment-sg; offset += segment-sg_offset + sg-offset; - r = tcp_sw_conn-sendpage(sk, sg_page(sg), offset, - copy, flags); + r = tcp_sw_conn-sendpage(sk, sg_page(sg), NULL, + offset, copy, flags); } else { struct msghdr msg = { .msg_flags = flags }; struct kvec iov = { diff --git a/drivers/scsi/iscsi_tcp.h b/drivers/scsi/iscsi_tcp.h index 666fe09..1e23265 100644 --- a/drivers/scsi/iscsi_tcp.h +++ b/drivers/scsi/iscsi_tcp.h @@ -52,7 +52,8 @@ struct iscsi_sw_tcp_conn { uint32_tsendpage_failures_cnt; uint32_tdiscontiguous_hdr_cnt; - ssize_t (*sendpage)(struct socket *, struct page *, int, size_t, int); + ssize_t (*sendpage)(struct socket *, struct page *, + struct skb_frag_destructor *, int, size_t, int); }; struct iscsi_sw_tcp_host { diff --git a/drivers/staging/pohmelfs/trans.c b/drivers/staging/pohmelfs/trans.c index 06c1a74..96a7921 100644 --- a/drivers/staging/pohmelfs/trans.c +++ b/drivers/staging/pohmelfs/trans.c @@ -104,7 +104,8 @@ static int netfs_trans_send_pages(struct netfs_trans *t, struct netfs_state *st) msg.msg_flags = MSG_WAITALL | (attached_pages == 1 ? 0 : MSG_MORE); - err = kernel_sendpage(st-socket, page, 0, size, msg.msg_flags); + err = kernel_sendpage(st-socket, page, NULL, + 0, size, msg.msg_flags); if (err = 0) { printk(%s: %d/%d failed to send transaction page: t: %p, gen: %u, size: %u, err: %d.\n, __func__, i, t-page_num, t, t-gen, size, err); diff --git a/drivers/target/iscsi/iscsi_target_util.c
Re: how can I achieve HA with ceph?
On Thu, Jan 5, 2012 at 5:24 AM, Karoly Horvath rhsw...@gmail.com wrote: Hi, back from holiday. I did a successful power unplug test now, but the FS was unavailable for 16 minutes which is clearly wrong... I have the log files but the MDS log is 1.2 gigabyte, if you let me know which lines to filter / filter out I will upload it somewhere... -- Karoly Horvath Assuming it's the same error as last time, the log will have a line that contains waiting for osdmap n (which blacklists prior instance), where n is an epoch number. Then at some later point there will be a line that looks something like the following: 2011-12-21 13:45:17.594746 7f4885307700 -- xxx.xxx.xxx.31:6800/4438 == mon.2 xxx.xxx.xxx.35:6789/0 9 osd_map(y..z src has 1..495) v2 748+0+0 (656995691 0 0) 0x1637400 con 0x163c000 Where y and z are an interval which contains n. (In the previous log, and probably here too, y=z=n.) I'm going to be interested in those two lines and the stuff following when the osdmap arrives. Probably I will only care about objecter lines, but it might be all of them...try trimming off the minute following that osdmap line; it'll probably contain more than I care about. :) -Greg On Fri, Dec 23, 2011 at 12:00 AM, Gregory Farnum gregory.far...@dreamhost.com wrote: On Wed, Dec 21, 2011 at 8:43 AM, Karoly Horvath rhsw...@gmail.com wrote: On Wed, Dec 21, 2011 at 4:13 PM, Gregory Farnum By client I assume you mean the kernel driver.. the FS is freezed, so I cannot unmount (cannot even `shutdown`).. how can I force the client to reconnect? Try a lazy force unmount: umount -lf ceph_mnt_point/ And then mount again. wow, never heard about this, thanks.:) will report with the next mail In the meantime I did one test, killing mds+osd+mon on beta, it's jammed in '{0=alpha=up:replay}', after 45 minutes I shut it down... I attached the logs. Oh, this is very odd! The MDS goes to sleep while it waits for an up-to-date OSDMap, but it never seems to get woken up even though I see the message sending in the OSDMap. So let's try this one more time, but this time also add in debug objecter = 20 to the MDS config...Those logs will include everything I need, or nothing will, promise! :) -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] net: add paged frag destructor support to kernel_sendpage.
From: Ian Campbell ian.campb...@citrix.com Date: Thu, 5 Jan 2012 17:13:43 + -static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffset, +static ssize_t do_tcp_sendpages(struct sock *sk, + struct page **pages, + struct skb_frag_destructor **destructors, + int poffset, size_t psize, int flags) { struct tcp_sock *tp = tcp_sk(sk); An array of destructors is madness, and the one call site that specifies this passes an address of a single entry. This also would never even have to occur if you put the destructor inside of struct page instead. Finally, except for the skb_shared_info() layout optimization in patch #1 which I alreayd applied, this stuff is not baked enough for the 3.3 merge window. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] net: add paged frag destructor support to kernel_sendpage.
On Thu, 2012-01-05 at 19:15 +, David Miller wrote: From: Ian Campbell ian.campb...@citrix.com Date: Thu, 5 Jan 2012 17:13:43 + -static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffset, +static ssize_t do_tcp_sendpages(struct sock *sk, + struct page **pages, + struct skb_frag_destructor **destructors, + int poffset, size_t psize, int flags) { struct tcp_sock *tp = tcp_sk(sk); An array of destructors is madness, and the one call site that specifies this passes an address of a single entry. I figured it was easy enough to accommodate the multiple destructor case but you are right that is overkill given the current (and realistically, expected) usage, I'll change that for the next round. (that's assuming we don't end up with some scheme where the struct page * is in the destructor struct like I was investigating previously to alleviate the frag size overhead. I guess this illustrates nicely why that approach got ugly: these array propagate all the way up the call chain if you do that) This also would never even have to occur if you put the destructor inside of struct page instead. Finally, except for the skb_shared_info() layout optimization in patch #1 which I alreayd applied, this stuff is not baked enough for the 3.3 merge window. Sure thing, I should have made it clear in my intro mail that I was aiming for 3.4. Thanks, Ian -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Add resource agents to debian build, trivial CP error
Hi, please consider two follow-up patches to the OCF resource agents: the first adds them to the Debian build, as a separate package ceph-resource-agents that depends on resource-agents, the second fixes a trivial (and embarassing, however harmless) cut and paste error. Thanks! Cheers, Florian -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] debian: build ceph-resource-agents
--- debian/ceph-resource-agents.install |1 + debian/control | 13 + debian/rules|2 ++ 3 files changed, 16 insertions(+), 0 deletions(-) create mode 100644 debian/ceph-resource-agents.install diff --git a/debian/ceph-resource-agents.install b/debian/ceph-resource-agents.install new file mode 100644 index 000..30843f6 --- /dev/null +++ b/debian/ceph-resource-agents.install @@ -0,0 +1 @@ +usr/lib/ocf/resource.d/ceph/* diff --git a/debian/control b/debian/control index e8c4d30..0f57ad3 100644 --- a/debian/control +++ b/debian/control @@ -112,6 +112,19 @@ Description: debugging symbols for ceph-common . This package contains the debugging symbols for ceph-common. +Package: ceph-resource-agents +Architecture: linux-any +Recommends: pacemaker +Priority: extra +Depends: ceph (= ${binary:Version}), ${misc:Depends}, resource-agents +Description: OCF-compliant resource agents for Ceph + Ceph is a distributed storage and network file system designed to provide + excellent performance, reliability, and scalability. + . + This package contains the resource agents (RAs) which integrate + Ceph with OCF-compliant cluster resource managers, + such as Pacemaker. + Package: librados2 Conflicts: librados, librados1 Replaces: librados, librados1 diff --git a/debian/rules b/debian/rules index 4f3fe62..0bc594a 100755 --- a/debian/rules +++ b/debian/rules @@ -20,6 +20,8 @@ endif export DEB_HOST_ARCH ?= $(shell dpkg-architecture -qDEB_HOST_ARCH) +extraopts += --with-ocf + ifeq ($(DEB_HOST_ARCH), armel) # armel supports ARMv4t or above instructions sets. # libatomic-ops is only usable with Ceph for ARMv6 or above. -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html