Re: [PATCH] implement librados aio_stat

2012-12-14 Thread Yehuda Sadeh
Went through it briefly, looks fine, though I'd like to go over it some more before picking this up. Note that LIBRADOS_VER_MINOR needs to be bumped up too. Thanks, Yehuda On Fri, Dec 14, 2012 at 3:18 AM, Filippos Giannakos wrote: > --- > src/include/rados/librados.h | 14 ++ >

Re: [PATCH 9/9] libceph: socket can close in any connection state

2012-12-14 Thread Sage Weil
Reviewed-by: Sage Weil On Thu, 13 Dec 2012, Alex Elder wrote: > A connection's socket can close for any reason, independent of the > state of the connection (and without irrespective of the connection > mutex). As a result, the connectino can be in pretty much any state > at the time its socket

Re: [PATCH 8/9] rbd: fix ceph_pg_poolid_by_name()

2012-12-14 Thread Sage Weil
Most of the code uses int64_t/__s64 for the pool id, although in a few cases we screwed up and limited it to 32 bits. In reality, that's way overkill anyway; we could have left it at 32 bits to begin with. My first instinct would be to change the return type to long long or s64 and avoid the u

Re: [PATCH 7/9] rbd: don't use ENOTSUPP

2012-12-14 Thread Sage Weil
Reviewed-by: Sage Weil On Thu, 13 Dec 2012, Alex Elder wrote: > ENOTSUPP is not a standard errno (it shows up as "Unknown error 524" > in an error message). This is what was getting produced when the > the local rbd code does not implement features required by a > discovered rbd image. > > Cha

Re: [PATCH 6/9] rbd: remove linger unconditionally

2012-12-14 Thread Sage Weil
Reviewed-by: Sage Weil On Thu, 13 Dec 2012, Alex Elder wrote: > In __unregister_linger_request(), the request is being removed > from the osd client's req_linger list only when the request > has a non-null osd pointer. It should be done whether or not > the request currently has an osd. > > Th

Re: [PATCH] libceph: report connection fault with warning

2012-12-14 Thread Sage Weil
Reviewed-by: Sage Weil On Fri, 14 Dec 2012, Alex Elder wrote: > When a connection's socket disconnects, or if there's a protocol > error of some kind on the connection, a fault is signaled and > the connection is reset (closed and reopened, basically). We > currently get an error message on the

[PATCH] libceph: report connection fault with warning

2012-12-14 Thread Alex Elder
When a connection's socket disconnects, or if there's a protocol error of some kind on the connection, a fault is signaled and the connection is reset (closed and reopened, basically). We currently get an error message on the log whenever this occurs. A ceph connection will attempt to reestablish

Re: [PATCH 5/9] libceph: init osd->o_node in create_osd()

2012-12-14 Thread Sage Weil
We should drop this one, I think. See upstream commit 4c199a93a2d36b277a9fd209a0f2793f8460a215. When we added the similar call on teh request tree it caused some noise in linux-next and then got removed. sage On Thu, 13 Dec 2012, Alex Elder wrote: > It turns out to be harmless but the red-b

Re: [PATCH 4/9] rbd: get rid of RBD_MAX_SEG_NAME_LEN

2012-12-14 Thread Sage Weil
Reviewed-by: Sage Weil On Thu, 13 Dec 2012, Alex Elder wrote: > RBD_MAX_SEG_NAME_LEN represents the maximum length of an rbd object > name (i.e., one of the objects providing storage backing an rbd > image). > > Another symbol, MAX_OBJ_NAME_SIZE, is used in the osd client code to > define the m

Re: [PATCH 3/9] libceph: avoid using freed osd in __kick_osd_requests()

2012-12-14 Thread Sage Weil
Reviewed-by: Sage Weil On Thu, 13 Dec 2012, Alex Elder wrote: > If an osd has no requests and no linger requests, __reset_osd() > will just remove it with a call to __remove_osd(). That drops > a reference to the osd, and therefore the osd may have been free > by the time __reset_osd() returns.

Re: [PATCH 1/9] rbd: do not allow remove of mounted-on image

2012-12-14 Thread Sage Weil
Reviewed-by: Sage Weil On Thu, 13 Dec 2012, Alex Elder wrote: > There is no check in rbd_remove() to see if anybody holds open the > image being removed. That's not cool. > > Add a simple open count that goes up and down with opens and closes > (releases) of the device, and don't allow an rbd

ceph-client/testing branch force-updated again

2012-12-14 Thread Alex Elder
I have updated the "testing" branch in the ceph-client git repository again, and you'll find that a "forced update" is needed to bring your own repository up to date. This will probably be necessary again at some point once we get some reviews done on commits still in this branch, but we'll try no

Re: osd crash after reboot

2012-12-14 Thread Stefan Priebe
Hi Sage, this was just an idea and i need to fix MY uuid problem. But then the crash is still a problem of ceph. Have you looked into my log? Am 14.12.2012 20:42, schrieb Sage Weil: On Fri, 14 Dec 2012, Stefan Priebe wrote: One more IMPORTANT note. This might happen due to the fact that a dis

Re: osd crash after reboot

2012-12-14 Thread Sage Weil
On Fri, 14 Dec 2012, Stefan Priebe wrote: > One more IMPORTANT note. This might happen due to the fact that a disk was > missing (disk failure) afte the reboot. > > fstab and mountpoint are working with UUIDs so they match but the journal > block device: > osd journal = /dev/sde1 > > didn't matc

Re: rbd map command hangs for 15 minutes during system start up

2012-12-14 Thread Alex Elder
On 12/14/2012 10:53 AM, Nick Bartos wrote: > Yes I was only enabling debugging for libceph. I'm adding debugging > for rbd as well. I'll do a repro later today when a test cluster > opens up. Excellent, thank you. -Alex -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" i

Re: [EXTERNAL] Re: OSDMonitor: don't allow creation of pools with > 65535 pgs

2012-12-14 Thread Jim Schutt
On 12/14/2012 09:59 AM, Joao Eduardo Luis wrote: > On 12/14/2012 03:41 PM, Jim Schutt wrote: >> Hi, >> >> I'm looking at commit e3ed28eb2 in the next branch, >> and I have a question. >> >> Shouldn't the limit be pg_num > 65536, because >> PGs are numbered 0 thru pg_num-1? >> >> If not, what am I m

Re: OSDMonitor: don't allow creation of pools with > 65535 pgs

2012-12-14 Thread Joao Eduardo Luis
On 12/14/2012 03:41 PM, Jim Schutt wrote: Hi, I'm looking at commit e3ed28eb2 in the next branch, and I have a question. Shouldn't the limit be pg_num > 65536, because PGs are numbered 0 thru pg_num-1? If not, what am I missing? FWIW, up through yesterday I've been using the next branch and t

Re: rbd map command hangs for 15 minutes during system start up

2012-12-14 Thread Nick Bartos
The kernel is 3.5.7 with the following patches applied (and in the order specified below): 001-libceph_eliminate_connection_state_DEAD_13_days_ago.patch 002-libceph_kill_bad_proto_ceph_connection_op_13_days_ago.patch 003-libceph_rename_socket_callbacks_13_days_ago.patch 004-libceph_rename_kvec_res

Re: rbd map command hangs for 15 minutes during system start up

2012-12-14 Thread Alex Elder
On 12/13/2012 01:00 PM, Nick Bartos wrote: > Here's another log with the kernel debugging enabled: > https://gist.github.com/raw/4278697/1c9e41d275e614783fbbdee8ca5842680f46c249/rbd-hang-1355424455.log > > Note that it hung on the 2nd try. Just to make sure I'm working with the right code base, c

Re: osd crash after reboot

2012-12-14 Thread Stefan Priebe - Profihost AG
Hello Mark, Am 14.12.2012 16:20, schrieb Mark Nelson: sudo parted -s -a optimal /dev/$DEV mklabel gpt sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-journal 0% 10G sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-data 10G 100% Isn't that the part type you're using? mkpart par

OSDMonitor: don't allow creation of pools with > 65535 pgs

2012-12-14 Thread Jim Schutt
Hi, I'm looking at commit e3ed28eb2 in the next branch, and I have a question. Shouldn't the limit be pg_num > 65536, because PGs are numbered 0 thru pg_num-1? If not, what am I missing? FWIW, up through yesterday I've been using the next branch and this: ceph osd pool set data pg_num 65536

Re: osd crash after reboot

2012-12-14 Thread Stefan Priebe - Profihost AG
Hi Mark, Am 14.12.2012 16:20, schrieb Mark Nelson: sudo parted -s -a optimal /dev/$DEV mklabel gpt sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-journal 0% 10G sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-data 10G 100% My disks are gpt too and i'm also using parted. But

Re: osd crash after reboot

2012-12-14 Thread Mark Nelson
Hi Stefan, Here's what I often do when I have a journal and data partition sharing a disk: sudo parted -s -a optimal /dev/$DEV mklabel gpt sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-journal 0% 10G sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-data 10G 100% Mark On 12

Re: osd crash after reboot

2012-12-14 Thread Stefan Priebe - Profihost AG
Hi Mark, but do i set a label for a partition without FS like the journal blockdev? Am 14.12.2012 16:01, schrieb Mark Nelson: I often map partitions to something in /dev/disk/by-partlabel and use those in my ceph.conf files. that way disks can be remapped behind the scenes and the ceph configur

Re: osd crash after reboot

2012-12-14 Thread Stefan Priebe - Profihost AG
Hello Dennis, Am 14.12.2012 15:52, schrieb Dennis Jacobfeuerborn: didn't match anymore - as the numbers got renumber due to the failed disk. Is there a way to use some kind of UUIDs here too for journal? You should be able to use /dev/disk/by-uuid/* instead. That should give you a stable view

Re: osd crash after reboot

2012-12-14 Thread Mark Nelson
On 12/14/2012 08:52 AM, Dennis Jacobfeuerborn wrote: On 12/14/2012 10:14 AM, Stefan Priebe wrote: One more IMPORTANT note. This might happen due to the fact that a disk was missing (disk failure) afte the reboot. fstab and mountpoint are working with UUIDs so they match but the journal block de

Re: Usage of CEPH FS versa HDFS for Hadoop: TeraSort benchmark performance comparison issue

2012-12-14 Thread Mark Nelson
On 12/13/2012 08:54 AM, Lachfeld, Jutta wrote: Hi all, Hi! Sorry to send this a bit late, it looks like the reply I authored yesterday from my phone got eaten by vger. I am currently doing some comparisons between CEPH FS and HDFS as a file system for Hadoop using Hadoop's integrated ben

Re: osd crash after reboot

2012-12-14 Thread Dennis Jacobfeuerborn
On 12/14/2012 10:14 AM, Stefan Priebe wrote: > One more IMPORTANT note. This might happen due to the fact that a disk was > missing (disk failure) afte the reboot. > > fstab and mountpoint are working with UUIDs so they match but the journal > block device: > osd journal = /dev/sde1 > > didn't m

RE: Usage of CEPH FS versa HDFS for Hadoop: TeraSort benchmark performance comparison issue

2012-12-14 Thread Lachfeld, Jutta
Hi Noah, Gregory and Sage, first of all, thanks for your quick replies. Here are some answers to your questions. Gregory, I have got the output of "ceph -s" before and after this specific TeraSort run, and to me it looks ok; all 30 osds are "up": health HEALTH_OK monmap e1: 1 mons at {0=

Re: [PATCH] implement librados aio_stat

2012-12-14 Thread Giannakos Filippos
Hi team, I forgot to include a description (also cc-ing correctly the synnefo-devel list). I am a member of the Synnefo team, where we are experimenting with RADOS as a storage backend to host blocks for our volume block storage named "archipelago". In this patch I implement aio stat and a

[PATCH] implement librados aio_stat

2012-12-14 Thread Filippos Giannakos
--- src/include/rados/librados.h | 14 ++ src/include/rados/librados.hpp | 15 +- src/librados/IoCtxImpl.cc | 42 src/librados/IoCtxImpl.h |9 + src/librados/librados.cc | 10 ++ 5 files

Re: osd crash after reboot

2012-12-14 Thread Stefan Priebe
One more IMPORTANT note. This might happen due to the fact that a disk was missing (disk failure) afte the reboot. fstab and mountpoint are working with UUIDs so they match but the journal block device: osd journal = /dev/sde1 didn't match anymore - as the numbers got renumber due to the fai

Re: [PATCH] rbd: Add --json flag for the showmapped command

2012-12-14 Thread Stratos Psomadakis
On 12/13/2012 07:17 PM, Yehuda Sadeh wrote: > On Thu, Dec 13, 2012 at 7:37 AM, Stratos Psomadakis wrote: >> Signed-off-by: Stratos Psomadakis >> --- >> Hi Josh, >> >> This patch adds the '--json' flag to enable dumping the showmapped output in > I think that it should be "--format=json" rather th

Re: Debian packaging question

2012-12-14 Thread James Page
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 14/12/12 04:38, Gary Lowell wrote: >> I think that the "--debbuildopts '-j8 -b'" might be trouncing >> the >>> - --binary-arch flag - I'll get pbuilder setup and give it a >>> test - I normally use sbuild (for which the packaging changes >>> did h

Re: osd crash after reboot

2012-12-14 Thread Stefan Priebe
same log more verbose: 11 ec=10 les/c 3307/3307 3306/3306/3306) [] r=0 lpr=0 lcod 0'0 mlcod 0'0 inactive] read_log done -11> 2012-12-14 09:17:50.648572 7fb6e0d6b780 10 osd.3 pg_epoch: 3996 pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10 les/c 3307/3307 3306/3306/3306) [

osd crash after reboot

2012-12-14 Thread Stefan Priebe
Hello list, after a reboot of my node i see this on all OSDs of this node after the reboot: 2012-12-14 09:03:20.393224 7f8e652f8780 -1 osd/OSD.cc: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e652f8780 time 2012-12-14 09:03:20.392528 osd/OSD.cc: 4385: FAILED assert(_get_ma