Hi Guys,

Any additional thoughts on this?  There was a bit of information shared
off-list I wanted to bring back:

Sam mentioned that the metadata looked odd, and suspected "some form of
32bit shenanigans in the key name construction".

However, that might not have been the case, because later came in with:

"Hmm.  Based on the omap and logs, the omap directory is simply a bunch
of updates behind.  Was the node rebooted as part of the osd restart?
FS is xfs?  What are your fs mount options?"

There was no node restart.  We are using XFS.

>From ceph.conf:

osd mount options xfs = "rw,noatime,inode64,logbufs=8,logbsize=256k"

And of course as soon as I paste that, I look at "inode64" on these 32-bit
ARM systems and think, "hmm".  I know 64-bit inodes are recommended for
filesystems > 1TB (these are 4TB drives), but have never thought about if
this is supported on a 32-bit system.  Quick web searches appear to
indicate this may be okay...

Sorry some of this may be a duplicate.  I wanted to bring it back on-list
in case someone looked at that and said "no, you can't use those XFS
options on 32-bit ARM."  =)

On a side note, I've been using the cluster heavily the last couple days,
with no other problems.  I just am not doing any cluster or OSD restarts
for fear of the OSD not coming back.

 - Travis


On Tue, Apr 30, 2013 at 12:17 PM, Travis Rhoden <trho...@gmail.com> wrote:

> On the OSD node:
>
> root@cepha0:~# lsb_release -a
> No LSB modules are available.
> Distributor ID:    Ubuntu
> Description:    Ubuntu 12.10
> Release:    12.10
> Codename:    quantal
> root@cepha0:~# dpkg -l "*leveldb*"
> Desired=Unknown/Install/Remove/Purge/Hold
> |
> Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
> |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
> ||/ Name                                   Version
> Architecture             Description
>
> +++-======================================-========================-========================-==================================================================================
> ii  libleveldb1:armhf                      0+20120530.gitdd0d562-2
> armhf                    fast key-value storage library
> root@cepha0:~# uname -a
> Linux cepha0 3.5.0-27-highbank #46-Ubuntu SMP Mon Mar 25 23:19:40 UTC 2013
> armv7l armv7l armv7l GNU/Linux
>
>
> On the MON node:
> # lsb_release -a
> No LSB modules are available.
> Distributor ID:    Ubuntu
> Description:    Ubuntu 12.10
> Release:    12.10
> Codename:    quantal
> # uname -a
> Linux  3.5.0-27-generic #46-Ubuntu SMP Mon Mar 25 19:58:17 UTC 2013 x86_64
> x86_64 x86_64 GNU/Linux
> # dpkg -l "*leveldb*"
> Desired=Unknown/Install/Remove/Purge/Hold
> |
> Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
> |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
> ||/ Name                                   Version
> Architecture             Description
>
> +++-======================================-========================-========================-==================================================================================
> un  leveldb-doc
> <none>                                            (no description available)
> ii  libleveldb-dev:amd64                   0+20120530.gitdd0d562-2
> amd64                    fast key-value storage library (development files)
> ii  libleveldb1:amd64                      0+20120530.gitdd0d562-2
> amd64                    fast key-value storage library
>
>
> On Tue, Apr 30, 2013 at 12:11 PM, Samuel Just <sam.j...@inktank.com>wrote:
>
>> What version of leveldb is installed?  Ubuntu/version?
>> -Sam
>>
>> On Tue, Apr 30, 2013 at 8:50 AM, Travis Rhoden <trho...@gmail.com> wrote:
>> > Interestingly, the down OSD does not get marked out after 5 minutes.
>> > Probably that is already fixed by http://tracker.ceph.com/issues/4822.
>> >
>> >
>> > On Tue, Apr 30, 2013 at 11:42 AM, Travis Rhoden <trho...@gmail.com>
>> wrote:
>> >>
>> >> Hi Sam,
>> >>
>> >> I was prepared to write in and say that the problem had gone away.  I
>> >> tried restarting several OSDs last night in the hopes of capturing the
>> >> problem on and OSD that hadn't failed yet, but didn't have any luck.
>>  So I
>> >> did indeed re-create the cluster from scratch (using mkcephfs), and
>> what do
>> >> you know -- everything worked.  I got everything in a nice stable
>> state,
>> >> then decided to do a full cluster restart, just to be sure.  Sure
>> enough,
>> >> one OSD failed to come up, and has the same stack trace.  So I believe
>> I
>> >> have the log you want -- just from the OSD that failed, right?
>> >>
>> >> Question -- any feeling for what parts of the log you need?  It's 688MB
>> >> uncompressed (two hours!), so I'd like to be able to trim some off for
>> you
>> >> before making it available.  Do you only need/want the part from after
>> the
>> >> OSD was restarted?  Or perhaps the corruption happens on OSD shutdown
>> and
>> >> you need some before that?  If you are fine with that large of a file,
>> I can
>> >> just make that available too.  Let me know.
>> >>
>> >>  - Travis
>> >>
>> >>
>> >> On Mon, Apr 29, 2013 at 6:26 PM, Travis Rhoden <trho...@gmail.com>
>> wrote:
>> >>>
>> >>> Hi Sam,
>> >>>
>> >>> No problem, I'll leave that debugging turned up high, and do a
>> mkcephfs
>> >>> from scratch and see what happens.  Not sure if it will happen again
>> or not.
>> >>> =)
>> >>>
>> >>> Thanks again.
>> >>>
>> >>>  - Travis
>> >>>
>> >>>
>> >>> On Mon, Apr 29, 2013 at 5:51 PM, Samuel Just <sam.j...@inktank.com>
>> >>> wrote:
>> >>>>
>> >>>> Hmm, I need logging from when the corruption happened.  If this is
>> >>>> reproducible, can you enable that logging on a clean osd (or better,
>> a
>> >>>> clean cluster) until the assert occurs?
>> >>>> -Sam
>> >>>>
>> >>>> On Mon, Apr 29, 2013 at 2:45 PM, Travis Rhoden <trho...@gmail.com>
>> >>>> wrote:
>> >>>> > Also, I can note that it does not take a full cluster restart to
>> >>>> > trigger
>> >>>> > this.  If I just restart an OSD that was up/in previously, the same
>> >>>> > error
>> >>>> > can happen (though not every time).  So restarting OSD's for me is
>> a
>> >>>> > bit
>> >>>> > like Russian roullette.  =)  Even though restarting an OSD may not
>> >>>> > also
>> >>>> > result in the error, it seems that once it happens that OSD is gone
>> >>>> > for
>> >>>> > good.  No amount of restart has brought any of the dead ones back.
>> >>>> >
>> >>>> > I'd really like to get to the bottom of it.  Let me know if I can
>> do
>> >>>> > anything to help.
>> >>>> >
>> >>>> > I may also have to try completely wiping/rebuilding to see if I can
>> >>>> > make
>> >>>> > this thing usable.
>> >>>> >
>> >>>> >
>> >>>> > On Mon, Apr 29, 2013 at 2:38 PM, Travis Rhoden <trho...@gmail.com>
>> >>>> > wrote:
>> >>>> >>
>> >>>> >> Hi Sam,
>> >>>> >>
>> >>>> >> Thanks for being willing to take a look.
>> >>>> >>
>> >>>> >> I applied the debug settings on one host that 3 out of 3 OSDs with
>> >>>> >> this
>> >>>> >> problem.  Then tried to start them up.  Here are the resulting
>> logs:
>> >>>> >>
>> >>>> >> https://dl.dropboxusercontent.com/u/23122069/cephlogs.tgz
>> >>>> >>
>> >>>> >>  - Travis
>> >>>> >>
>> >>>> >>
>> >>>> >> On Mon, Apr 29, 2013 at 1:04 PM, Samuel Just <
>> sam.j...@inktank.com>
>> >>>> >> wrote:
>> >>>> >>>
>> >>>> >>> You appear to be missing pg metadata for some reason.  If you can
>> >>>> >>> reproduce it with
>> >>>> >>> debug osd = 20
>> >>>> >>> debug filestore = 20
>> >>>> >>> debug ms = 1
>> >>>> >>> on all of the OSDs, I should be able to track it down.
>> >>>> >>>
>> >>>> >>> I created a bug: #4855.
>> >>>> >>>
>> >>>> >>> Thanks!
>> >>>> >>> -Sam
>> >>>> >>>
>> >>>> >>> On Mon, Apr 29, 2013 at 9:52 AM, Travis Rhoden <
>> trho...@gmail.com>
>> >>>> >>> wrote:
>> >>>> >>> > Thanks Greg.
>> >>>> >>> >
>> >>>> >>> > I quit playing with it because every time I restarted the
>> cluster
>> >>>> >>> > (service
>> >>>> >>> > ceph -a restart), I lost more OSDs..  First time it was 1, 2nd
>> 10,
>> >>>> >>> > 3rd
>> >>>> >>> > time
>> >>>> >>> > 13...  All 13 down OSDs all show the same stacktrace.
>> >>>> >>> >
>> >>>> >>> >  - Travis
>> >>>> >>> >
>> >>>> >>> >
>> >>>> >>> > On Mon, Apr 29, 2013 at 11:56 AM, Gregory Farnum
>> >>>> >>> > <g...@inktank.com>
>> >>>> >>> > wrote:
>> >>>> >>> >>
>> >>>> >>> >> This sounds vaguely familiar to me, and I see
>> >>>> >>> >> http://tracker.ceph.com/issues/4052, which is marked as
>> "Can't
>> >>>> >>> >> reproduce" — I think maybe this is fixed in "next" and
>> "master",
>> >>>> >>> >> but
>> >>>> >>> >> I'm not sure. For more than that I'd have to defer to Sage or
>> >>>> >>> >> Sam.
>> >>>> >>> >> -Greg
>> >>>> >>> >> Software Engineer #42 @ http://inktank.com | http://ceph.com
>> >>>> >>> >>
>> >>>> >>> >>
>> >>>> >>> >> On Sat, Apr 27, 2013 at 6:43 PM, Travis Rhoden
>> >>>> >>> >> <trho...@gmail.com>
>> >>>> >>> >> wrote:
>> >>>> >>> >> > Hey folks,
>> >>>> >>> >> >
>> >>>> >>> >> > I'm helping put together a new test/experimental cluster,
>> and
>> >>>> >>> >> > hit
>> >>>> >>> >> > this
>> >>>> >>> >> > today
>> >>>> >>> >> > when bringing the cluster up for the first time (using
>> >>>> >>> >> > mkcephfs).
>> >>>> >>> >> >
>> >>>> >>> >> > After doing the normal "service ceph -a start", I noticed
>> one
>> >>>> >>> >> > OSD
>> >>>> >>> >> > was
>> >>>> >>> >> > down,
>> >>>> >>> >> > and a lot of PGs were stuck creating.  I tried restarting
>> the
>> >>>> >>> >> > down
>> >>>> >>> >> > OSD,
>> >>>> >>> >> > but
>> >>>> >>> >> > it would come up.  It always had this error:
>> >>>> >>> >> >
>> >>>> >>> >> >     -1> 2013-04-27 18:11:56.179804 b6fcd000  2 osd.1 0 boot
>> >>>> >>> >> >      0> 2013-04-27 18:11:56.402161 b6fcd000 -1 osd/PG.cc: In
>> >>>> >>> >> > function
>> >>>> >>> >> > 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t,
>> >>>> >>> >> > hobject_t&,
>> >>>> >>> >> > ceph::bufferlist*)' thread b6fcd000 time 2013-04-27
>> >>>> >>> >> > 18:11:56.399089
>> >>>> >>> >> > osd/PG.cc: 2556: FAILED assert(values.size() == 1)
>> >>>> >>> >> >
>> >>>> >>> >> >  ceph version 0.60-401-g17a3859
>> >>>> >>> >> > (17a38593d60f5f29b9b66c13c0aaa759762c6d04)
>> >>>> >>> >> >  1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&,
>> >>>> >>> >> > ceph::buffer::list*)+0x1ad) [0x2c3c0a]
>> >>>> >>> >> >  2: (OSD::load_pgs()+0x357) [0x28cba0]
>> >>>> >>> >> >  3: (OSD::init()+0x741) [0x290a16]
>> >>>> >>> >> >  4: (main()+0x1427) [0x2155c0]
>> >>>> >>> >> >  5: (__libc_start_main()+0x99) [0xb69bcf42]
>> >>>> >>> >> >  NOTE: a copy of the executable, or `objdump -rdS
>> <executable>`
>> >>>> >>> >> > is
>> >>>> >>> >> > needed to
>> >>>> >>> >> > interpret this.
>> >>>> >>> >> >
>> >>>> >>> >> >
>> >>>> >>> >> > I then did a full cluster restart, and now I have ten OSDs
>> down
>> >>>> >>> >> > --
>> >>>> >>> >> > each
>> >>>> >>> >> > showing the same exception/failed assert.
>> >>>> >>> >> >
>> >>>> >>> >> > Anybody seen this?
>> >>>> >>> >> >
>> >>>> >>> >> > I know I'm running a weird version -- it's compiled from
>> >>>> >>> >> > source, and
>> >>>> >>> >> > was
>> >>>> >>> >> > provided to me.  The OSDs are all on ARM, and the mon is
>> >>>> >>> >> > x86_64.
>> >>>> >>> >> > Just
>> >>>> >>> >> > looking to see if anyone has seen this particular stack
>> trace
>> >>>> >>> >> > of
>> >>>> >>> >> > load_pgs()/peek_map_epoch() before....
>> >>>> >>> >> >
>> >>>> >>> >> >  - Travis
>> >>>> >>> >> >
>> >>>> >>> >> > _______________________________________________
>> >>>> >>> >> > ceph-users mailing list
>> >>>> >>> >> > ceph-users@lists.ceph.com
>> >>>> >>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>> >>> >> >
>> >>>> >>> >
>> >>>> >>> >
>> >>>> >>
>> >>>> >>
>> >>>> >
>> >>>
>> >>>
>> >>
>> >
>>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to