OSD crash

2012-06-16 Thread Stefan Priebe

Hi,

today i got another osd crash ;-( Strangely the osd logs are all empty. 
It seems the logrotate hasn't reloaded the daemons but i still have the 
core dump file? What's next?


Stefan

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Updating OSD from current stable (0.47-2) to next failed with broken filestore

2012-06-16 Thread Simon Frerichs | Fremaks GmbH

Hi,

i tried updating one of our osds from stable 0.47-2 to latest next 
branch and it started updating the filestore and failed.
After that neither next branch osd nor stable osd would start with this 
filestore anymore.

Is their something wrong with the filestore update?

Jun 16 14:10:03 fcstore01 ceph-osd: 2012-06-16 14:10:03.134135 
7ffed3e35780 0 filestore(/data/osd11) mount FIEMAP ioctl is supported 
and appears to work Jun 16 14:10:03 fcstore01 ceph-osd: 2012-06-16 
14:10:03.134163 7ffed3e35780 0 filestore(/data/osd11) mount FIEMAP ioctl 
is disabled via 'filestore fiemap' config option Jun 16 14:10:03 
fcstore01 ceph-osd: 2012-06-16 14:10:03.134476 7ffed3e35780 0 
filestore(/data/osd11) mount did NOT detect btrfs Jun 16 14:10:03 
fcstore01 ceph-osd: 2012-06-16 14:10:03.134485 7ffed3e35780 0 
filestore(/data/osd11) mount syncfs(2) syscall not support by glibc Jun 
16 14:10:03 fcstore01 ceph-osd: 2012-06-16 14:10:03.134513 7ffed3e35780 
0 filestore(/data/osd11) mount no syncfs(2), must use sync(2). Jun 16 
14:10:03 fcstore01 ceph-osd: 2012-06-16 14:10:03.134514 7ffed3e35780 0 
filestore(/data/osd11) mount WARNING: multiple ceph-osd daemons on the 
same host will be slow Jun 16 14:10:03 fcstore01 ceph-osd: 2012-06-16 
14:10:03.134551 7ffed3e35780 -1 filestore(/data/osd11) FileStore::mount 
: stale version stamp detected: 2. Proceeding, do_update is set, DO NOT 
USE THIS OPTION IF YOU DO NOT KNOW WHAT IT DOES. More details can be 
found on the wiki. Jun 16 14:10:03 fcstore01 ceph-osd: 2012-06-16 
14:10:03.134585 7ffed3e35780 0 filestore(/data/osd11) mount found snaps 
<> Jun 16 14:10:12 fcstore01 ceph-osd: 2012-06-16 14:10:12.531974 
7ffed3e35780 0 filestore(/data/osd11) mount: enabling WRITEAHEAD journal 
mode: btrfs not detected Jun 16 14:10:12 fcstore01 ceph-osd: 2012-06-16 
14:10:12.543721 7ffed3e35780 1 journal _open /dev/sdb1 fd 18: 
53687091200 bytes, block size 4096 bytes, directio = 1, aio = 0 Jun 16 
14:10:12 fcstore01 ceph-osd: 2012-06-16 14:10:12.588059 7ffed3e35780 1 
journal _open /dev/sdb1 fd 18: 53687091200 bytes, block size 4096 bytes, 
directio = 1, aio = 0 Jun 16 14:10:12 fcstore01 ceph-osd: 2012-06-16 
14:10:12.588905 7ffed3e35780 -1 FileStore is old at version 2. 
Updating... Jun 16 14:10:12 fcstore01 ceph-osd: 2012-06-16 
14:10:12.588914 7ffed3e35780 -1 Removing tmp pgs Jun 16 14:10:12 
fcstore01 ceph-osd: 2012-06-16 14:10:12.594362 7ffed3e35780 -1 Getting 
collections Jun 16 14:10:12 fcstore01 ceph-osd: 2012-06-16 
14:10:12.594369 7ffed3e35780 -1 597 to process. Jun 16 14:10:12 
fcstore01 ceph-osd: 2012-06-16 14:10:12.595195 7ffed3e35780 -1 0/597 
processed Jun 16 14:10:12 fcstore01 ceph-osd: 2012-06-16 14:10:12.595213 
7ffed3e35780 -1 Updating collection omap current version is 0 Jun 16 
14:10:12 fcstore01 ceph-osd: 2012-06-16 14:10:12.662274 7ffed3e35780 -1 
os/FlatIndex.cc: In function 'virtual int 
FlatIndex::collection_list_partial(const hobject_t&, int, int, snapid_t, 
std::vector*, hobject_t*)' thread 7ffed3e35780 time 
2012-06-16 14:10:12.637479#012os/FlatIndex.cc: 386: FAILED 
assert(0)#012#012 ceph version 0.47.2-500-g1e899d0 
(commit:1e899d08e61bbba0af6f3600b6bc9a5fc9e5c2e9)#012 1: 
/usr/local/bin/ceph-osd() [0x6b337d]#012 2: 
(FileStore::collection_list_partial(coll_t, hobject_t, int, int, 
snapid_t, std::vector >*, 
hobject_t*)+0x9c) [0x67b24c]#012 3: 
(OSD::convert_collection(ObjectStore*, coll_t)+0x529) [0x5b90e9]#012 4: 
(OSD::do_convertfs(ObjectStore*)+0x46f) [0x5b9b9f]#012 5: 
(OSD::convertfs(std::string const&, std::string const&)+0x47) 
[0x5ba127]#012 6: (main()+0x967) [0x531d07]#012 7: 
(__libc_start_main()+0xfd) [0x7ffed1d8aead]#012 8: 
/usr/local/bin/ceph-osd() [0x5357b9]#012 NOTE: a copy of the executable, 
or `objdump -rdS ` is needed to interpret this.


Simon

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD crash

2012-06-16 Thread Stefan Priebe

and another crash again ;-(


 0> 2012-06-16 15:31:32.524369 7fd8935c4700 -1 ./common/Mutex.h: In 
function 'void Mutex::Lock(bool)' thread 7fd8935c4700 time 2012-06-16 
15:31:32.522446

./common/Mutex.h: 110: FAILED assert(r == 0)

 ceph version  (commit:)
 1: /usr/bin/ceph-osd() [0x51a07d]
 2: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c5a]
 3: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2e4) [0x684374]
 4: (ThreadPool::worker()+0xbb7) [0x7bc087]
 5: (ThreadPool::WorkThread::entry()+0xd) [0x5f144d]
 6: (()+0x68ca) [0x7fd89db3a8ca]
 7: (clone()+0x6d) [0x7fd89c1bec0d]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.


--- end dump of recent events ---
2012-06-16 15:31:32.531567 7fd8935c4700 -1 *** Caught signal (Aborted) **
 in thread 7fd8935c4700

 ceph version  (commit:)
 1: /usr/bin/ceph-osd() [0x70e4b9]
 2: (()+0xeff0) [0x7fd89db42ff0]
 3: (gsignal()+0x35) [0x7fd89c121225]
 4: (abort()+0x180) [0x7fd89c124030]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fd89c9b5dc5]
 6: (()+0xcb166) [0x7fd89c9b4166]
 7: (()+0xcb193) [0x7fd89c9b4193]
 8: (()+0xcb28e) [0x7fd89c9b428e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x940) [0x78af20]

 10: /usr/bin/ceph-osd() [0x51a07d]
 11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c5a]
 12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2e4) [0x684374]
 13: (ThreadPool::worker()+0xbb7) [0x7bc087]
 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f144d]
 15: (()+0x68ca) [0x7fd89db3a8ca]
 16: (clone()+0x6d) [0x7fd89c1bec0d]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.


--- begin dump of recent events ---
 0> 2012-06-16 15:31:32.531567 7fd8935c4700 -1 *** Caught signal 
(Aborted) **

 in thread 7fd8935c4700

 ceph version  (commit:)
 1: /usr/bin/ceph-osd() [0x70e4b9]
 2: (()+0xeff0) [0x7fd89db42ff0]
 3: (gsignal()+0x35) [0x7fd89c121225]
 4: (abort()+0x180) [0x7fd89c124030]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fd89c9b5dc5]
 6: (()+0xcb166) [0x7fd89c9b4166]
 7: (()+0xcb193) [0x7fd89c9b4193]
 8: (()+0xcb28e) [0x7fd89c9b428e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x940) [0x78af20]

 10: /usr/bin/ceph-osd() [0x51a07d]
 11: (ReplicatedPG::C_OSD_OndiskWriteUnlock::finish(int)+0x2a) [0x579c5a]
 12: (FileStore::_finish_op(FileStore::OpSequencer*)+0x2e4) [0x684374]
 13: (ThreadPool::worker()+0xbb7) [0x7bc087]
 14: (ThreadPool::WorkThread::entry()+0xd) [0x5f144d]
 15: (()+0x68ca) [0x7fd89db3a8ca]
 16: (clone()+0x6d) [0x7fd89c1bec0d]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.


--- end dump of recent events ---

Am 16.06.2012 14:57, schrieb Stefan Priebe:

Hi,

today i got another osd crash ;-( Strangely the osd logs are all empty.
It seems the logrotate hasn't reloaded the daemons but i still have the
core dump file? What's next?

Stefan



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RBD layering design draft

2012-06-16 Thread Sage Weil
On Fri, 15 Jun 2012, Yehuda Sadeh wrote:
> On Fri, Jun 15, 2012 at 5:46 PM, Sage Weil  wrote:
> > Looks good!  Couple small things:
> >
> >>     $ rbd unpreserve pool/image@snap
> >
> > Is 'preserve' and 'unpreserve' the verbiage we want to use here?  Not sure
> > I have a better suggestion, but preserve is unusual.
> >
> 
> freeze, thaw/unfreeze?

Freeze/thaw usually mean something like quiesce I/O or read-only, usually 
temporarily.  What we actaully mean is "you can't delete this".  Maybe 
pin/unpin?  preserve/unpreserve may be fine, too!

sage

Re: sync Ceph packaging efforts for Debian/Ubuntu

2012-06-16 Thread Sage Weil
Hi Laszlo!

On Sat, 16 Jun 2012, Laszlo Boszormenyi (GCS) wrote:
> Hi all parties,
> 
> Ceph is packaged at three places: in Ubuntu, at upstream and in Debian.
> Its first 'stable' release, 0.48 is coming. At least as Sage wrote some
> days ago: "It looks like 0.48 will also be the basis for one of our
> first 'stable' releases.".
> 
> Hereby I would like to sync our Debian related packaging efforts, not
> only for the mentioned stable release. The OpenStack packaging team
> would like to add Ceph to their stack. Howtos, helping hands on forums
> needs it as well.
> I'm not subscribed to the lists, please keep me in the loop with Cc-s.
> 
> First patch, 0002-Add-support-PPC.patch is for upstream. Would help the
> in tree leveldb to build on PowerPC architectures as well.
> Second patch reflect the homepage and git tree changes for Ubuntu.
> Third patch is from them, noted as "Switch from libcryptopp to libnss as
> libcryptopp is not seeded.". So libnss3-dev is used as build-dependency
> instead. Sage, would you commit it?

I haven't looked closely yet, but these should be fine.

> I've separated gceph out, if someone needs the CLI only, then s/he can
> do that without the GTK+ libraries. See below.

gceph is already gone from the 'next' branch.

> Ben, James, can you please share in some sentences why ceph-fuse is
> dropped in Ubuntu? Do you need it Sage? If it's feasible, you may drop
> that as well.

We could package it separately, if necessary, but it should be packaged.  

> As I see, you still ship d/librgw1.install , d/librgw1.postrm ,
> d/librgw1.postinst and librgw-dev.install . They are not needed anymore.

These are already gone from the 'next' branch as well.

> Maybe the biggest change is that ceph-mds was separated out and such,
> ceph-fs-common created for cephfs and mount.ceph.

We can do this as well, but I would prefer not to.  Opinions?

> Please move the configure call to its target, as you can check in git.
> Add var/lib/ceph/mon , var/lib/ceph/osd and var/lib/ceph/mds to
> d/ceph.dirs .

This should be done in 'next'.

> git patch is for Sage, upstream and contains what needed for the new
> packages. May I get commit rights to debian/ or should I go with git
> forks and you'll merge the changes?

Patches or a git merge request are best as they allow for review.

Thanks, Laszlo!  Looking forward to seeing this in wheezy, and easing the 
burden on you and the Ubuntu guys.

sage

 
> Also it seems that limit the architectures to build on is not
> allowed[1]. I'll write an email to this issue, how to go with failing
> leveldb build-dependency on some archs.
> 
> Loic, do you need anything or do you have any objections with these
> changes?
> 
> Regards,
> Laszlo/GCS
> [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=677626
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Unmountable btrfs filesystems

2012-06-16 Thread Wido den Hollander

Hi,

On my dev cluster (10 nodes, 40 OSD's) I'm still trying to run Ceph on 
btrfs, but over the last couple of months I've lost multiple OSD's due 
to btrfs.


On my nodes I've set kernel.panic=60 so that whenever a kernel panic 
occurs I get the node back within two minutes.


Now, over the last time I've seen multiple nodes reboot (didn't see the 
strace), but afterwards the btrfs filesystems on that node were unmountable.


"btrfs: open_ctree failed"

I tried various kernels, the most recent 3.3.0 from kernel.ubuntu.com, 
but I'm still seeing this.


Is anyone seeing the same or did everybody migrate away to ext4 or XFS?

I still prefer btrfs due to the snapshotting, but loosing all these 
OSD's all the time is getting kind of frustrating.


Any thoughts or comments?

Wido
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unmountable btrfs filesystems

2012-06-16 Thread Mark Nelson

On 6/16/12 1:46 PM, Wido den Hollander wrote:

Hi,

On my dev cluster (10 nodes, 40 OSD's) I'm still trying to run Ceph on
btrfs, but over the last couple of months I've lost multiple OSD's due
to btrfs.

On my nodes I've set kernel.panic=60 so that whenever a kernel panic
occurs I get the node back within two minutes.

Now, over the last time I've seen multiple nodes reboot (didn't see the
strace), but afterwards the btrfs filesystems on that node were
unmountable.

"btrfs: open_ctree failed"

I tried various kernels, the most recent 3.3.0 from kernel.ubuntu.com,
but I'm still seeing this.

Is anyone seeing the same or did everybody migrate away to ext4 or XFS?

I still prefer btrfs due to the snapshotting, but loosing all these
OSD's all the time is getting kind of frustrating.

Any thoughts or comments?

Wido
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html


Hi Wido,

btrfsck might tell you what's wrong.  Sounds like there is a 
btrfs-restore command in the dangerdonteveruse branch you could try. 
Beyond that, I guess it just really comes down to tradeoffs.


Good luck! ;)

Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sync Ceph packaging efforts for Debian/Ubuntu

2012-06-16 Thread Sage Weil
Hi Laszlo,

I've take a closer look at these patches, and have a few questions.

- The URL change and nss patches I've applied; they are in the ceph.git 
'debian' branch.

- Has the leveldb patch been sent upstream?  Once it is committed to 
the upstream git, we can update ceph to use it; that's nicer than carrying 
the patch.  However, I thought you needed to link against the existing 
libleveldb1 package... which means we shouldn't do anything on our side, 
right?

- I'm not sure how useful it is to break mount.ceph and cephfs into a 
separate ceph-fs-common package, but we can do it.  Same goes for a 
separate package for ceph-mds.  That was originally motivated by ubuntu 
not wanting the mds in main, but in the end only the libraries went in, so 
it's a moot point.  I'd rather hear from them what their intentions are 
for 12.10 before complicating things...

- That same patch also switched all the Architecture: lines back to 
linux-any.  Was that intentional?  I just changed them from that last 
week.

- I did apply the python-ceph Depends: portion of that patch.

The result so far is in the 'debian' branch of ceph.git.  Please take a 
look.

Thanks!
sage


On Sat, 
16 Jun 2012, Laszlo Boszormenyi (GCS) wrote:

> Hi all parties,
> 
> Ceph is packaged at three places: in Ubuntu, at upstream and in Debian.
> Its first 'stable' release, 0.48 is coming. At least as Sage wrote some
> days ago: "It looks like 0.48 will also be the basis for one of our
> first 'stable' releases.".
> 
> Hereby I would like to sync our Debian related packaging efforts, not
> only for the mentioned stable release. The OpenStack packaging team
> would like to add Ceph to their stack. Howtos, helping hands on forums
> needs it as well.
> I'm not subscribed to the lists, please keep me in the loop with Cc-s.
> 
> First patch, 0002-Add-support-PPC.patch is for upstream. Would help the
> in tree leveldb to build on PowerPC architectures as well.
> Second patch reflect the homepage and git tree changes for Ubuntu.
> Third patch is from them, noted as "Switch from libcryptopp to libnss as
> libcryptopp is not seeded.". So libnss3-dev is used as build-dependency
> instead. Sage, would you commit it?
> I've separated gceph out, if someone needs the CLI only, then s/he can
> do that without the GTK+ libraries. See below.
> 
> Ben, James, can you please share in some sentences why ceph-fuse is
> dropped in Ubuntu? Do you need it Sage? If it's feasible, you may drop
> that as well.
> As I see, you still ship d/librgw1.install , d/librgw1.postrm ,
> d/librgw1.postinst and librgw-dev.install . They are not needed anymore.
> Maybe the biggest change is that ceph-mds was separated out and such,
> ceph-fs-common created for cephfs and mount.ceph .
> Please move the configure call to its target, as you can check in git.
> Add var/lib/ceph/mon , var/lib/ceph/osd and var/lib/ceph/mds to
> d/ceph.dirs .
> 
> git patch is for Sage, upstream and contains what needed for the new
> packages. May I get commit rights to debian/ or should I go with git
> forks and you'll merge the changes?
> 
> Also it seems that limit the architectures to build on is not
> allowed[1]. I'll write an email to this issue, how to go with failing
> leveldb build-dependency on some archs.
> 
> Loic, do you need anything or do you have any objections with these
> changes?
> 
> Regards,
> Laszlo/GCS
> [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=677626
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html