Re: [ceph-users] installing multi osd and monitor of ceph in single VM

2016-08-10 Thread Brad Hubbard
On Thu, Aug 11, 2016 at 10:23:19AM +0700, agung Laksono wrote:
> Thank you Brad,
> 
> I am able to run ceph with 3 MONs, 3 OSDs and 1 MDS now.
> 
> However, I still not get the workflow of the ceph using this step.
> I might need print in somewhere, inject a crash by kill one node, etc.
> 
> Does this also possible using this method?
> 

Sure, something like this?

$ ps uwwx|grep ceph-
brad 26160  0.6  0.1 389424 25436 pts/2Sl   13:37   0:00 
/home/brad/working/src/ceph/build/bin/ceph-mon -i a -c 
/home/brad/working/src/ceph/build/ceph.conf
brad 26175  0.3  0.1 385316 23604 pts/2Sl   13:37   0:00 
/home/brad/working/src/ceph/build/bin/ceph-mon -i b -c 
/home/brad/working/src/ceph/build/ceph.conf
brad 26194  0.3  0.1 383264 22932 pts/2Sl   13:37   0:00 
/home/brad/working/src/ceph/build/bin/ceph-mon -i c -c 
/home/brad/working/src/ceph/build/ceph.conf
brad 27230  1.3  0.1 831920 26908 ?Ssl  13:37   0:00 
/home/brad/working/src/ceph/build/bin/ceph-osd -i 0 -c 
/home/brad/working/src/ceph/build/ceph.conf
brad 27553  1.4  0.1 832812 27924 ?Ssl  13:37   0:00 
/home/brad/working/src/ceph/build/bin/ceph-osd -i 1 -c 
/home/brad/working/src/ceph/build/ceph.conf
brad 27895  1.4  0.1 831792 27472 ?Ssl  13:37   0:00 
/home/brad/working/src/ceph/build/bin/ceph-osd -i 2 -c 
/home/brad/working/src/ceph/build/ceph.conf
brad 28294  0.1  0.0 410648 15652 ?Ssl  13:37   0:00 
/home/brad/working/src/ceph/build/bin/ceph-mds -i a -c 
/home/brad/working/src/ceph/build/ceph.conf
brad 28914  0.0  0.0 118496   944 pts/2S+   13:38   0:00 grep --color 
ceph-

$ kill -SIGSEGV 27553

$ ps uwwx|grep ceph-
brad 26160  0.5  0.1 390448 26720 pts/2Sl   13:37   0:00 
/home/brad/working/src/ceph/build/bin/ceph-mon -i a -c 
/home/brad/working/src/ceph/build/ceph.conf
brad 26175  0.2  0.1 386852 24752 pts/2Sl   13:37   0:00 
/home/brad/working/src/ceph/build/bin/ceph-mon -i b -c 
/home/brad/working/src/ceph/build/ceph.conf
brad 26194  0.2  0.1 384800 24160 pts/2Sl   13:37   0:00 
/home/brad/working/src/ceph/build/bin/ceph-mon -i c -c 
/home/brad/working/src/ceph/build/ceph.conf
brad 27230  0.5  0.1 833976 27012 ?Ssl  13:37   0:00 
/home/brad/working/src/ceph/build/bin/ceph-osd -i 0 -c 
/home/brad/working/src/ceph/build/ceph.conf
brad 27895  0.4  0.1 831792 27616 ?Ssl  13:37   0:00 
/home/brad/working/src/ceph/build/bin/ceph-osd -i 2 -c 
/home/brad/working/src/ceph/build/ceph.conf
brad 28294  0.0  0.0 410648 15620 ?Ssl  13:37   0:00 
/home/brad/working/src/ceph/build/bin/ceph-mds -i a -c 
/home/brad/working/src/ceph/build/ceph.conf
brad 30635  0.0  0.0 118496   900 pts/2S+   13:38   0:00 grep --color 
ceph-

$ egrep -C10 '^2.*Segmentation fault' out/osd.1.log
2016-08-11 13:38:22.837492 7fa0c74ae700  1 -- 127.0.0.1:0/27551 --> 
127.0.0.1:6811/27893 -- osd_ping(ping e10 stamp 2016-08-11 13:38:22.837435) v2 
-- ?+0 0xba0f600 con 0xb92b320
2016-08-11 13:38:22.837725 7fa0c36a1700  1 -- 127.0.0.1:0/27551 <== osd.0 
127.0.0.1:6803/27228 8  osd_ping(ping_reply e10 stamp 2016-08-11 
13:38:22.837435) v2  47+0+0 (617510928 0 0) 0xb7efa00 con 0xb6560a0
2016-08-11 13:38:22.837737 7fa0c2994700  1 -- 127.0.0.1:0/27551 <== osd.2 
127.0.0.1:6811/27893 8  osd_ping(ping_reply e10 stamp 2016-08-11 
13:38:22.837435) v2  47+0+0 (617510928 0 0) 0xb7ee400 con 0xb92b320
2016-08-11 13:38:22.837791 7fa0c2893700  1 -- 127.0.0.1:0/27551 <== osd.2 
127.0.0.1:6810/27893 8  osd_ping(ping_reply e10 stamp 2016-08-11 
13:38:22.837435) v2  47+0+0 (617510928 0 0) 0xb7ef600 con 0xb92b200
2016-08-11 13:38:22.837800 7fa0c37a2700  1 -- 127.0.0.1:0/27551 <== osd.0 
127.0.0.1:6802/27228 8  osd_ping(ping_reply e10 stamp 2016-08-11 
13:38:22.837435) v2  47+0+0 (617510928 0 0) 0xb7ef000 con 0xb655b00
2016-08-11 13:38:23.871496 7fa0c319c700  1 -- 127.0.0.1:6806/27551 <== osd.2 
127.0.0.1:0/27893 9  osd_ping(ping e10 stamp 2016-08-11 13:38:23.871366) v2 
 47+0+0 (3526151526 0 0) 0xb90d000 con 0xb657060
2016-08-11 13:38:23.871497 7fa0c309b700  1 -- 127.0.0.1:6807/27551 <== osd.2 
127.0.0.1:0/27893 9  osd_ping(ping e10 stamp 2016-08-11 13:38:23.871366) v2 
 47+0+0 (3526151526 0 0) 0xb90b400 con 0xb6572a0
2016-08-11 13:38:23.871540 7fa0c319c700  1 -- 127.0.0.1:6806/27551 --> 
127.0.0.1:0/27893 -- osd_ping(ping_reply e10 stamp 2016-08-11 13:38:23.871366) 
v2 -- ?+0 0xb7ede00 con 0xb657060
2016-08-11 13:38:23.871574 7fa0c309b700  1 -- 127.0.0.1:6807/27551 --> 
127.0.0.1:0/27893 -- osd_ping(ping_reply e10 stamp 2016-08-11 13:38:23.871366) 
v2 -- ?+0 0xb7ef400 con 0xb6572a0
2016-08-11 13:38:24.039347 7fa0dd331700  1 -- 127.0.0.1:6804/27551 --> 
127.0.0.1:6790/0 -- pg_stats(0 pgs tid 6 v 0) v1 -- ?+0 0xb6c4680 con 0xb654fc0
2016-08-11 13:38:24.381589 7fa0eb2d58c0 -1 *** Caught signal (Segmentation 
fault) **
 in thread 7fa0eb2d58c0 thread_name:ceph-osd

 ceph version v11.0.0-798-g62e8a97 

Re: [ceph-users] installing multi osd and monitor of ceph in single VM

2016-08-10 Thread agung Laksono
Thank you Brad,

I am able to run ceph with 3 MONs, 3 OSDs and 1 MDS now.

However, I still not get the workflow of the ceph using this step.
I might need print in somewhere, inject a crash by kill one node, etc.

Does this also possible using this method?




On Thu, Aug 11, 2016 at 4:17 AM, Brad Hubbard  wrote:

> On Thu, Aug 11, 2016 at 12:45 AM, agung Laksono 
> wrote:
> > I've seen the Ansible before  but not in detail for that.
> > I also have tried to run quick guide for development.
> > It did not work on my VM that I already install ceph inside it.
> >
> > the error is :
> >
> >  agung@arrasyid:~/ceph/ceph/src$ ./vstart.sh -d -n -x
> > ** going verbose **
> > [./fetch_config /tmp/fetched.ceph.conf.3818]
> > ./init-ceph: failed to fetch config with './fetch_config
> > /tmp/fetched.ceph.conf.3818'
> >
> >
> > Do I need to use a vanilla ceph to make vstart.sh work?
> >
> > When I learn a cloud system, usually I compile
> > the source code,  run in pseudo-distributed, modify the code
> > and add prints somewhere, recompile and re-run the system.
> > Might this method work for exploring ceph?
>
> It should, sure.
>
> Try this.
>
> 1) Clone a fresh copy of the repo.
> 2) ./do_cmake.sh
> 3) cd build
> 4) make
> 5) OSD=3 MON=3 MDS=1 ../src/vstart.sh -n -x -l
> 6) bin/ceph -s
>
> That should give you a working cluster with 3 MONs, 3 OSDs and 1 MDS.
>
> --
> Cheers,
> Brad
>
> >
> >
> > On Wed, Aug 10, 2016 at 9:14 AM, Brad Hubbard 
> wrote:
> >>
> >> On Wed, Aug 10, 2016 at 12:26 AM, agung Laksono  >
> >> wrote:
> >> >
> >> > Hi Ceph users,
> >> >
> >> > I am new in ceph. I've been succeed installing ceph in 4 VM using
> Quick
> >> > installation guide in ceph documentation.
> >> >
> >> > And I've also done to compile
> >> > ceph from source code, build and install in single vm.
> >> >
> >> > What I want to do next is that run ceph multiple nodes in a cluster
> >> > but only inside a single machine. I need this because I will
> >> > learn the ceph code and will modify some codes, recompile and
> >> > redeploy on the node/VM. On my study, I've also to be able to run/kill
> >> > particular node.
> >> >
> >> > does somebody know how to configure single vm to run multiple osd and
> >> > monitor of ceph?
> >> >
> >> > Advises and comments are very appreciate. thanks
> >>
> >> Hi,
> >>
> >> Did you see this?
> >>
> >>
> >> http://docs.ceph.com/docs/hammer/dev/quick_guide/#
> running-a-development-deployment
> >>
> >> Also take a look at the AIO (all in one) options in ceph-ansible.
> >>
> >> HTH,
> >> Brad
> >
> >
> >
> >
> > --
> > Cheers,
> >
> > Agung Laksono
> >
>
>
>


-- 
Cheers,

Agung Laksono
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd-nbd kernel requirements

2016-08-10 Thread Shawn Edwards
Is there a minimum kernel version required for rbd-nbd to work and work
well?  Before I start stress testing it, I want to be sure I have a system
that is expected to work.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: lost power. monitors died. Cephx errors now

2016-08-10 Thread Sean Sullivan
I think it just got worse::

all three monitors on my other cluster say that ceph-mon can't open
/var/lib/ceph/mon/$(hostname). Is there any way to recover if you lose all
3 monitors? I saw a post by Sage saying that the data can be recovered as
all of the data is held on other servers. Is this possible? If so has
anyone had any experience doing so?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS crash

2016-08-10 Thread Randy Orr
Patrick,

We are using the kernel client. We have a mix of 4.4 and 3.19 kernels on
the client side with plans to move away from the 3.19 kernel where/when we
can.

-Randy

On Wed, Aug 10, 2016 at 4:24 PM, Patrick Donnelly 
wrote:

> Randy, are you using ceph-fuse or the kernel client (or something else)?
>
> On Wed, Aug 10, 2016 at 2:33 PM, Randy Orr  wrote:
> > Great, thank you. Please let me know if I can be of any assistance in
> > testing or validating a fix.
> >
> > -Randy
> >
> > On Wed, Aug 10, 2016 at 1:21 PM, Patrick Donnelly 
> > wrote:
> >>
> >> Hello Randy,
> >>
> >> On Wed, Aug 10, 2016 at 12:20 PM, Randy Orr 
> wrote:
> >> > mds/Locker.cc: In function 'bool Locker::check_inode_max_size(
> CInode*,
> >> > bool,
> >> > bool, uint64_t, bool, uint64_t, utime_t)' thread 7fc305b83700 time
> >> > 2016-08-09 18:51:50.626630
> >> > mds/Locker.cc: 2190: FAILED assert(in->is_file())
> >> >
> >> >  ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
> >> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >> > const*)+0x8b) [0x563d1e0a2d3b]
> >> >  2: (Locker::check_inode_max_size(CInode*, bool, bool, unsigned long,
> >> > bool,
> >> > unsigned long, utime_t)+0x15e3) [0x563d1de506a3]
> >> >  3: (Server::handle_client_open(std::shared_ptr
> &)+0x1061)
> >> > [0x563d1dd386a1]
> >> >  4:
> >> > (Server::dispatch_client_request(std::shared_ptr<
> MDRequestImpl>&)+0xa0b)
> >> > [0x563d1dd5709b]
> >> >  5: (Server::handle_client_request(MClientRequest*)+0x47f)
> >> > [0x563d1dd5768f]
> >> >  6: (Server::dispatch(Message*)+0x3bb) [0x563d1dd5b8db]
> >> >  7: (MDSRank::handle_deferrable_message(Message*)+0x80c)
> >> > [0x563d1dce1f8c]
> >> >  8: (MDSRank::_dispatch(Message*, bool)+0x1e1) [0x563d1dceb081]
> >> >  9: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x563d1dcec1d5]
> >> >  10: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x563d1dcd3f83]
> >> >  11: (DispatchQueue::entry()+0x78b) [0x563d1e1996cb]
> >> >  12: (DispatchQueue::DispatchThread::entry()+0xd) [0x563d1e08862d]
> >> >  13: (()+0x8184) [0x7fc30bd7c184]
> >> >  14: (clone()+0x6d) [0x7fc30a2d337d]
> >> >  NOTE: a copy of the executable, or `objdump -rdS ` is
> >> > needed to
> >> > interpret this.
> >>
> >> I have a bug report filed for this issue:
> >> http://tracker.ceph.com/issues/16983
> >>
> >> I believe it should be straightforward to solve and we'll have a fix
> >> for it soon.
> >>
> >> Thanks for the report!
> >>
> >> --
> >> Patrick Donnelly
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
> Patrick Donnelly
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS crash

2016-08-10 Thread Patrick Donnelly
Randy, are you using ceph-fuse or the kernel client (or something else)?

On Wed, Aug 10, 2016 at 2:33 PM, Randy Orr  wrote:
> Great, thank you. Please let me know if I can be of any assistance in
> testing or validating a fix.
>
> -Randy
>
> On Wed, Aug 10, 2016 at 1:21 PM, Patrick Donnelly 
> wrote:
>>
>> Hello Randy,
>>
>> On Wed, Aug 10, 2016 at 12:20 PM, Randy Orr  wrote:
>> > mds/Locker.cc: In function 'bool Locker::check_inode_max_size(CInode*,
>> > bool,
>> > bool, uint64_t, bool, uint64_t, utime_t)' thread 7fc305b83700 time
>> > 2016-08-09 18:51:50.626630
>> > mds/Locker.cc: 2190: FAILED assert(in->is_file())
>> >
>> >  ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
>> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> > const*)+0x8b) [0x563d1e0a2d3b]
>> >  2: (Locker::check_inode_max_size(CInode*, bool, bool, unsigned long,
>> > bool,
>> > unsigned long, utime_t)+0x15e3) [0x563d1de506a3]
>> >  3: (Server::handle_client_open(std::shared_ptr&)+0x1061)
>> > [0x563d1dd386a1]
>> >  4:
>> > (Server::dispatch_client_request(std::shared_ptr&)+0xa0b)
>> > [0x563d1dd5709b]
>> >  5: (Server::handle_client_request(MClientRequest*)+0x47f)
>> > [0x563d1dd5768f]
>> >  6: (Server::dispatch(Message*)+0x3bb) [0x563d1dd5b8db]
>> >  7: (MDSRank::handle_deferrable_message(Message*)+0x80c)
>> > [0x563d1dce1f8c]
>> >  8: (MDSRank::_dispatch(Message*, bool)+0x1e1) [0x563d1dceb081]
>> >  9: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x563d1dcec1d5]
>> >  10: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x563d1dcd3f83]
>> >  11: (DispatchQueue::entry()+0x78b) [0x563d1e1996cb]
>> >  12: (DispatchQueue::DispatchThread::entry()+0xd) [0x563d1e08862d]
>> >  13: (()+0x8184) [0x7fc30bd7c184]
>> >  14: (clone()+0x6d) [0x7fc30a2d337d]
>> >  NOTE: a copy of the executable, or `objdump -rdS ` is
>> > needed to
>> > interpret this.
>>
>> I have a bug report filed for this issue:
>> http://tracker.ceph.com/issues/16983
>>
>> I believe it should be straightforward to solve and we'll have a fix
>> for it soon.
>>
>> Thanks for the report!
>>
>> --
>> Patrick Donnelly
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] installing multi osd and monitor of ceph in single VM

2016-08-10 Thread Brad Hubbard
On Thu, Aug 11, 2016 at 12:45 AM, agung Laksono  wrote:
> I've seen the Ansible before  but not in detail for that.
> I also have tried to run quick guide for development.
> It did not work on my VM that I already install ceph inside it.
>
> the error is :
>
>  agung@arrasyid:~/ceph/ceph/src$ ./vstart.sh -d -n -x
> ** going verbose **
> [./fetch_config /tmp/fetched.ceph.conf.3818]
> ./init-ceph: failed to fetch config with './fetch_config
> /tmp/fetched.ceph.conf.3818'
>
>
> Do I need to use a vanilla ceph to make vstart.sh work?
>
> When I learn a cloud system, usually I compile
> the source code,  run in pseudo-distributed, modify the code
> and add prints somewhere, recompile and re-run the system.
> Might this method work for exploring ceph?

It should, sure.

Try this. 

1) Clone a fresh copy of the repo.
2) ./do_cmake.sh
3) cd build
4) make
5) OSD=3 MON=3 MDS=1 ../src/vstart.sh -n -x -l
6) bin/ceph -s

That should give you a working cluster with 3 MONs, 3 OSDs and 1 MDS.

-- 
Cheers,
Brad

>
>
> On Wed, Aug 10, 2016 at 9:14 AM, Brad Hubbard  wrote:
>>
>> On Wed, Aug 10, 2016 at 12:26 AM, agung Laksono 
>> wrote:
>> >
>> > Hi Ceph users,
>> >
>> > I am new in ceph. I've been succeed installing ceph in 4 VM using Quick
>> > installation guide in ceph documentation.
>> >
>> > And I've also done to compile
>> > ceph from source code, build and install in single vm.
>> >
>> > What I want to do next is that run ceph multiple nodes in a cluster
>> > but only inside a single machine. I need this because I will
>> > learn the ceph code and will modify some codes, recompile and
>> > redeploy on the node/VM. On my study, I've also to be able to run/kill
>> > particular node.
>> >
>> > does somebody know how to configure single vm to run multiple osd and
>> > monitor of ceph?
>> >
>> > Advises and comments are very appreciate. thanks
>>
>> Hi,
>>
>> Did you see this?
>>
>>
>> http://docs.ceph.com/docs/hammer/dev/quick_guide/#running-a-development-deployment
>>
>> Also take a look at the AIO (all in one) options in ceph-ansible.
>>
>> HTH,
>> Brad
>
>
>
>
> --
> Cheers,
>
> Agung Laksono
>


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] lost power. monitors died. Cephx errors now

2016-08-10 Thread Sean Sullivan
So our datacenter lost power and 2/3 of our monitors died with FS
corruption. I tried fixing it but it looks like the store.db didn't make
it.

I copied the working journal via



   1.

   sudo mv /var/lib/ceph/mon/ceph-$(hostname){,.BAK}

   2.

   sudo ceph-mon -i {mon-id} --mkfs --monmap {tmp}/{map-filename}
--keyring {tmp}/{key-filename}



   1.

   ceph-mon -i `hostname` --extract-monmap /tmp/monmap

   2.

   ceph-mon -i {mon-id} --inject-monmap {map-path}


and for a brief moment i had a quorum but any ceph cli commands would
result in cephx errors. Now the two failed monitors have elected a quorum
and the monitor that was working keeps getting kicked out of the cluster::


 '''
{
"election_epoch": 402,
"quorum": [
0,
1
],
"quorum_names": [
"kh11-8",
"kh12-8"
],
"quorum_leader_name": "kh11-8",
"monmap": {
"epoch": 1,
"fsid": "a6ae50db-5c71-4ef8-885e-8137c7793da8",
"modified": "0.00",
"created": "0.00",
"mons": [
{
"rank": 0,
"name": "kh11-8",
"addr": "10.64.64.134:6789\/0"
},
{
"rank": 1,
"name": "kh12-8",
"addr": "10.64.64.143:6789\/0"
},
{
"rank": 2,
"name": "kh13-8",
"addr": "10.64.64.151:6789\/0"
}
]
}
}
'''

At this point I am not sure what to do as any ceph commands return cephx
errors and I can't seem to verify if the new "quorum" is actually valid

any way to regenerate a cephx authentication key or recover it with
hardware access to the nodes or any advice on how to recover from what
seems to be complete monitor failure?


-- 
- Sean:  I wrote this. -
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Power Outage! Oh No!

2016-08-10 Thread Sean Sullivan
So we recently had a power outage and I seem to have lost 2 of 3 of my
monitors. I have since copied /var/lib/ceph/mon/ceph-$(hostname){,.BAK} and
then created a new cephfs and finally generated a new filesystem via

''' sudo ceph-mon -i {mon-id} --mkfs --monmap {tmp}/{map-filename}
--keyring {tmp}/{key-filename} '''

After this I copied the monmap from the working monitor to the other two.
via::
''' ceph-mon -i {mon-id} --inject-monmap {map-path} '''

At this point I was left with a working monitor map (afaik) but ceph
cli commands return ::
'''
root@kh11-8:/var/run/ceph# ceph -s
2016-08-10 14:13:58.563241 7fdd719b3700  0 librados: client.admin
authentication error (1) Operation not permitted
Error connecting to cluster: PermissionError
'''

Now after waiting a little while it looks like the quorum kicked out the
only working monitor::

'''
{
"election_epoch": 358,
"quorum": [
0,
1
],
"quorum_names": [
"kh11-8",
"kh12-8"
],
"quorum_leader_name": "kh11-8",
"monmap": {
"epoch": 1,
"fsid": "a6ae50db-5c71-4ef8-885e-8137c7793da8",
"modified": "0.00",
"created": "0.00",
"mons": [
{
"rank": 0,
"name": "kh11-8",
"addr": "10.64.64.134:6789\/0"
},
{
"rank": 1,
"name": "kh12-8",
"addr": "10.64.64.143:6789\/0"
},
{
"rank": 2,
"name": "kh13-8",
"addr": "10.64.64.151:6789\/0"
}
]
}
}
'''
kh13-8 was the original working node and kh11-8 and kh12-8 were the ones
that had fs issues.

Currently I am at a loss as to what to do as ceph -w and -s commands do not
work due to permissions/cephx errors and the original working monitor was
kicked out.

Is there any way to regenerate the cephx authentication and recover the
monitor map?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Backfilling pgs not making progress

2016-08-10 Thread Samuel Just
Ok, can you
1) Open a bug
2) Identify all osds involved in the 5 problem pgs
3) enable debug osd = 20, debug filestore = 20, debug ms = 1 on all of them
4) mark the primary for each pg down (should cause peering and
backfill to restart)
5) link all logs to the bug

Thanks!
-Sam

On Tue, Jul 26, 2016 at 9:11 AM, Samuel Just  wrote:
> Hmm, nvm, it's not an lfn object anyway.
> -Sam
>
> On Tue, Jul 26, 2016 at 7:07 AM, Brian Felton  wrote:
>> If I search on osd.580, I find
>> default.421929.15\uTEPP\s84316222-6ddd-4ac9-8283-6fa1cdcf9b88\sbackups\s20160630091353\sp1\s\sShares\sWarehouse\sLondonWarehouse\sLondon\sRon
>> picture's\sMISCELLANEOUS\s2014\sOct., 2014\sOct.
>> 1\sDSC04329.JPG__head_981926C1__21__5, which has a non-zero
>> size and a hash (981926C1) that matches that of the same file found on the
>> other OSDs in the pg.
>>
>> If I'm misunderstanding what you're asking about a dangling link, please
>> point me in the right direction.
>>
>> Brian
>>
>> On Tue, Jul 26, 2016 at 8:59 AM, Samuel Just  wrote:
>>>
>>> Did you also confirm that the backfill target does not have any of
>>> those dangling links?  I'd be looking for a dangling link for
>>>
>>> 981926c1/default.421929.15_TEPP/84316222-6ddd-4ac9-8283-6fa1cdcf9b88/backups/20160630091353/p1//Shares/Warehouse/LondonWarehouse/London/Ron
>>> picture's/MISCELLANEOUS/2014/Oct., 2014/Oct. 1/DSC04329.JPG/head//33
>>> on osd.580.
>>> -Sam
>>>
>>> On Mon, Jul 25, 2016 at 9:04 PM, Brian Felton  wrote:
>>> > Sam,
>>> >
>>> > I cranked up the logging on the backfill target (osd 580 on node 07) and
>>> > the
>>> > acting primary for the pg (453 on node 08, for what it's worth).  The
>>> > logs
>>> > from the primary are very large, so pardon the tarballs.
>>> >
>>> > PG Primary Logs:
>>> > https://www.dropbox.com/s/ipjobn2i5ban9km/backfill-primary-log.tgz?dl=0B
>>> > PG Backfill Target Logs:
>>> > https://www.dropbox.com/s/9qpiqsnahx0qc5k/backfill-target-log.tgz?dl=0
>>> >
>>> > I'll be reviewing them with my team tomorrow morning to see if we can
>>> > find
>>> > anything.  Thanks for your assistance.
>>> >
>>> > Brian
>>> >
>>> > On Mon, Jul 25, 2016 at 3:33 PM, Samuel Just  wrote:
>>> >>
>>> >> The next thing I'd want is for you to reproduce with
>>> >>
>>> >> debug osd = 20
>>> >> debug filestore = 20
>>> >> debug ms = 1
>>> >>
>>> >> and post the file somewhere.
>>> >> -Sam
>>> >>
>>> >> On Mon, Jul 25, 2016 at 1:33 PM, Samuel Just  wrote:
>>> >> > If you don't have the orphaned file link, it's not the same bug.
>>> >> > -Sam
>>> >> >
>>> >> > On Mon, Jul 25, 2016 at 12:55 PM, Brian Felton 
>>> >> > wrote:
>>> >> >> Sam,
>>> >> >>
>>> >> >> I'm reviewing that thread now, but I'm not seeing a lot of overlap
>>> >> >> with
>>> >> >> my
>>> >> >> cluster's situation.  For one, I am unable to start either a repair
>>> >> >> or
>>> >> >> a
>>> >> >> deep scrub on any of the affected pgs.  I've instructed all six of
>>> >> >> the
>>> >> >> pgs
>>> >> >> to scrub, deep-scrub, and repair, and the cluster has been gleefully
>>> >> >> ignoring these requests (it has been several hours since I first
>>> >> >> tried,
>>> >> >> and
>>> >> >> the logs indicate none of the pgs ever scrubbed).  Second, none of
>>> >> >> the
>>> >> >> my
>>> >> >> OSDs is crashing.  Third, none of my pgs or objects has ever been
>>> >> >> marked
>>> >> >> inconsistent (or unfound, for that matter) -- I'm only seeing the
>>> >> >> standard
>>> >> >> mix of degraded/misplaced objects that are common during a recovery.
>>> >> >> What
>>> >> >> I'm not seeing is any further progress on the number of misplaced
>>> >> >> objects --
>>> >> >> the number has remained effectively unchanged for the past several
>>> >> >> days.
>>> >> >>
>>> >> >> To be sure, though, I tracked down the file that the backfill
>>> >> >> operation
>>> >> >> seems to be hung on, and I can find it in both the backfill target
>>> >> >> osd
>>> >> >> (580)
>>> >> >> and a few other osds in the pg.  In all cases, I was able to find
>>> >> >> the
>>> >> >> file
>>> >> >> with an identical hash value on all nodes, and I didn't find any
>>> >> >> duplicates
>>> >> >> or potential orphans.  Also, none of the objects involves have long
>>> >> >> names,
>>> >> >> so they're not using the special ceph long filename handling.
>>> >> >>
>>> >> >> Also, we are not using XFS on our OSDs; we are using ZFS instead.
>>> >> >>
>>> >> >> If I'm misunderstanding the issue linked above and the corresponding
>>> >> >> thread,
>>> >> >> please let me know.
>>> >> >>
>>> >> >> Brian
>>> >> >>
>>> >> >>
>>> >> >> On Mon, Jul 25, 2016 at 1:32 PM, Samuel Just 
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> You may have hit http://tracker.ceph.com/issues/14766.  There was a
>>> >> >>> thread on the list a while back about diagnosing and fixing it.
>>> >> >>> -Sam
>>> >> >>>
>>> >> >>> On Mon, 

Re: [ceph-users] MDS crash

2016-08-10 Thread Randy Orr
Great, thank you. Please let me know if I can be of any assistance in
testing or validating a fix.

-Randy

On Wed, Aug 10, 2016 at 1:21 PM, Patrick Donnelly 
wrote:

> Hello Randy,
>
> On Wed, Aug 10, 2016 at 12:20 PM, Randy Orr  wrote:
> > mds/Locker.cc: In function 'bool Locker::check_inode_max_size(CInode*,
> bool,
> > bool, uint64_t, bool, uint64_t, utime_t)' thread 7fc305b83700 time
> > 2016-08-09 18:51:50.626630
> > mds/Locker.cc: 2190: FAILED assert(in->is_file())
> >
> >  ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x8b) [0x563d1e0a2d3b]
> >  2: (Locker::check_inode_max_size(CInode*, bool, bool, unsigned long,
> bool,
> > unsigned long, utime_t)+0x15e3) [0x563d1de506a3]
> >  3: (Server::handle_client_open(std::shared_ptr&)+0x1061)
> > [0x563d1dd386a1]
> >  4: (Server::dispatch_client_request(std::shared_ptr<
> MDRequestImpl>&)+0xa0b)
> > [0x563d1dd5709b]
> >  5: (Server::handle_client_request(MClientRequest*)+0x47f)
> [0x563d1dd5768f]
> >  6: (Server::dispatch(Message*)+0x3bb) [0x563d1dd5b8db]
> >  7: (MDSRank::handle_deferrable_message(Message*)+0x80c)
> [0x563d1dce1f8c]
> >  8: (MDSRank::_dispatch(Message*, bool)+0x1e1) [0x563d1dceb081]
> >  9: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x563d1dcec1d5]
> >  10: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x563d1dcd3f83]
> >  11: (DispatchQueue::entry()+0x78b) [0x563d1e1996cb]
> >  12: (DispatchQueue::DispatchThread::entry()+0xd) [0x563d1e08862d]
> >  13: (()+0x8184) [0x7fc30bd7c184]
> >  14: (clone()+0x6d) [0x7fc30a2d337d]
> >  NOTE: a copy of the executable, or `objdump -rdS ` is
> needed to
> > interpret this.
>
> I have a bug report filed for this issue: http://tracker.ceph.com/
> issues/16983
>
> I believe it should be straightforward to solve and we'll have a fix
> for it soon.
>
> Thanks for the report!
>
> --
> Patrick Donnelly
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS crash

2016-08-10 Thread Patrick Donnelly
Hello Randy,

On Wed, Aug 10, 2016 at 12:20 PM, Randy Orr  wrote:
> mds/Locker.cc: In function 'bool Locker::check_inode_max_size(CInode*, bool,
> bool, uint64_t, bool, uint64_t, utime_t)' thread 7fc305b83700 time
> 2016-08-09 18:51:50.626630
> mds/Locker.cc: 2190: FAILED assert(in->is_file())
>
>  ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x8b) [0x563d1e0a2d3b]
>  2: (Locker::check_inode_max_size(CInode*, bool, bool, unsigned long, bool,
> unsigned long, utime_t)+0x15e3) [0x563d1de506a3]
>  3: (Server::handle_client_open(std::shared_ptr&)+0x1061)
> [0x563d1dd386a1]
>  4: (Server::dispatch_client_request(std::shared_ptr&)+0xa0b)
> [0x563d1dd5709b]
>  5: (Server::handle_client_request(MClientRequest*)+0x47f) [0x563d1dd5768f]
>  6: (Server::dispatch(Message*)+0x3bb) [0x563d1dd5b8db]
>  7: (MDSRank::handle_deferrable_message(Message*)+0x80c) [0x563d1dce1f8c]
>  8: (MDSRank::_dispatch(Message*, bool)+0x1e1) [0x563d1dceb081]
>  9: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x563d1dcec1d5]
>  10: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x563d1dcd3f83]
>  11: (DispatchQueue::entry()+0x78b) [0x563d1e1996cb]
>  12: (DispatchQueue::DispatchThread::entry()+0xd) [0x563d1e08862d]
>  13: (()+0x8184) [0x7fc30bd7c184]
>  14: (clone()+0x6d) [0x7fc30a2d337d]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to
> interpret this.

I have a bug report filed for this issue: http://tracker.ceph.com/issues/16983

I believe it should be straightforward to solve and we'll have a fix
for it soon.

Thanks for the report!

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD crashes on EC recovery

2016-08-10 Thread Brian Felton
Roeland,

We're seeing the same problems in our cluster.  I can't offer you a
solution that gets the OSD back, but I can tell you what I did to work
around it.

We're running 5 0.94.6 clusters with 9 nodes / 648 HDD OSDs with a k=7, m=2
erasure coded .rgw.buckets pool.  During the backfilling after a recent
disk replacement, we had four OSDs that got in a very similar state.

2016-08-09 07:40:12.475699 7f025b06b700 -1 osd/ECBackend.cc: In function
'void ECBackend::handle_recovery_push(PushOp&, RecoveryMessages*)' thread
7f025b06b700 time 2016-08-09 07:40:12.472819
osd/ECBackend.cc: 281: FAILED assert(op.attrset.count(string("_")))

 ceph version 0.94.6-2 (f870be457b16e4ff56ced74ed3a3c9a4c781f281)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x8b) [0xba997b]
 2: (ECBackend::handle_recovery_push(PushOp&, RecoveryMessages*)+0xd7f)
[0xa239ff]
 3: (ECBackend::handle_message(std::tr1::shared_ptr)+0x1de)
[0xa2600e]
 4: (ReplicatedPG::do_request(std::tr1::shared_ptr&,
ThreadPool::TPHandle&)+0x167) [0x8305e7]
 5: (OSD::dequeue_op(boost::intrusive_ptr,
std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x3bd) [0x6a157d]
 6: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x338) [0x6a1aa8]
 7: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x85f)
[0xb994cf]
 8: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xb9b5f0]
 9: (()+0x8184) [0x7f0284e35184]
 10: (clone()+0x6d) [0x7f028324c37d]

To allow the cluster to recover, we ended up reweighting the OSDs that got
into this state to 0 (ceph osd crush reweight  0).  This will of
course kick off a long round of backfilling, but it eventually recovers.
We've never found a solution that gets the OSD healthy again that doesn't
involve nuking the underlying disk and starting over.  We've had 10 OSDs
get in this state across 2 clusters in the last few months.  The
failure/crash message is always the same.  If someone does know of a way to
recover the OSD, that would be great.

I hope this helps.

Brian Felton

On Wed, Aug 10, 2016 at 10:17 AM, Roeland Mertens <
roeland.mert...@genomicsplc.com> wrote:

> Hi,
>
> we run a Ceph 10.2.1 cluster across 35 nodes with a total of 595 OSDs, we
> have a mixture of normally replicated volumes and EC volumes using the
> following erasure-code-profile:
>
> # ceph osd erasure-code-profile get rsk8m5
> jerasure-per-chunk-alignment=false
> k=8
> m=5
> plugin=jerasure
> ruleset-failure-domain=host
> ruleset-root=default
> technique=reed_sol_van
> w=8
>
> Now we had a disk failure and on swap out we seem to have encountered a
> bug where during recovery OSDs crash when trying to fix certain pgs that
> may have been corrupted.
>
> For example:
>-3> 2016-08-10 12:38:21.302938 7f893e2d7700  5 -- op tracker -- seq:
> 3434, time: 2016-08-10 12:38:21.302938, event: queued_for_pg, op:
> MOSDECSubOpReadReply(63.1a18s0 47661 ECSubReadReply(tid=1, attrs_read=0))
> -2> 2016-08-10 12:38:21.302981 7f89bef50700  1 --
> 10.93.105.11:6831/2674119 --> 10.93.105.22:6802/357033 --
> osd_map(47662..47663 src has 32224..47663) v3 -- ?+0 0x559c1057f3c0 con
> 0x559c0664a700
> -1> 2016-08-10 12:38:21.302996 7f89bef50700  5 -- op tracker -- seq:
> 3434, time: 2016-08-10 12:38:21.302996, event: reached_pg, op:
> MOSDECSubOpReadReply(63.1a18s0 47661 ECSubReadReply(tid=1, attrs_read=0))
>  0> 2016-08-10 12:38:21.306193 7f89bef50700 -1 osd/ECBackend.cc: In
> function 'virtual void 
> OnRecoveryReadComplete::finish(std::pair ECBackend::read_result_t&>&)' thread 7f89bef50700 time 2016-08-10
> 12:38:21.303012
> osd/ECBackend.cc: 203: FAILED assert(res.errors.empty())
>
> then the ceph-osd daemon goes splat. I've attached an extract of a logfile
> showing a bit more.
>
> Anyone have any ideas? I'm stuck now with a pg that's stuck as
> down+remapped+peering. ceph pg query tells me that peering is blocked to
> the loss of an osd, though restarting it just results in another crash of
> the ceph-osd daemon. We tried to force a rebuild by using
> ceph-objectstore-tool to delete the pg segment on some of the OSDs that are
> crashing but that didn't help one iota.
>
> Any help would be greatly appreciated,
>
> regards,
>
> Roeland
>
> --
> This email is sent on behalf of Genomics plc, a public limited company
> registered in England and Wales with registered number 8839972, VAT
> registered number 189 2635 65 and registered office at King Charles House,
> Park End Street, Oxford, OX1 1JD, United Kingdom.
> The contents of this e-mail and any attachments are confidential to the
> intended recipient. If you are not the intended recipient please do not use
> or publish its contents, contact Genomics plc immediately at
> i...@genomicsplc.com  then delete. You may not
> copy, forward, use or disclose the contents of this email to anybody else
> if you are not the intended recipient. Emails are not secure and may
> contain viruses.
>
> 

[ceph-users] MDS crash

2016-08-10 Thread Randy Orr
Hello,

We have recently had some failures with our MDS processes. We are running
Jewel 10.2.1. The two MDS services are on dedicated hosts running in
active/standby on Ubuntu 14.04.3 with kernel 3.19.0-56-generic. I have
searched the mailing list and open tickets without much luck so far.

The first indication of a problem is:

mds/Locker.cc: In function 'bool Locker::check_inode_max_size(CInode*,
bool, bool, uint64_t, bool, uint64_t, utime_t)' thread 7fc305b83700 time
2016-08-09 18:51:50.626630
mds/Locker.cc: 2190: FAILED assert(in->is_file())

 ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x8b) [0x563d1e0a2d3b]
 2: (Locker::check_inode_max_size(CInode*, bool, bool, unsigned long, bool,
unsigned long, utime_t)+0x15e3) [0x563d1de506a3]
 3: (Server::handle_client_open(std::shared_ptr&)+0x1061)
[0x563d1dd386a1]
 4:
(Server::dispatch_client_request(std::shared_ptr&)+0xa0b)
[0x563d1dd5709b]
 5: (Server::handle_client_request(MClientRequest*)+0x47f) [0x563d1dd5768f]
 6: (Server::dispatch(Message*)+0x3bb) [0x563d1dd5b8db]
 7: (MDSRank::handle_deferrable_message(Message*)+0x80c) [0x563d1dce1f8c]
 8: (MDSRank::_dispatch(Message*, bool)+0x1e1) [0x563d1dceb081]
 9: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x563d1dcec1d5]
 10: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x563d1dcd3f83]
 11: (DispatchQueue::entry()+0x78b) [0x563d1e1996cb]
 12: (DispatchQueue::DispatchThread::entry()+0xd) [0x563d1e08862d]
 13: (()+0x8184) [0x7fc30bd7c184]
 14: (clone()+0x6d) [0x7fc30a2d337d]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.

...

I snipped the dump of recent events, but can certainly include them if it
would help in debugging.

...

Upstart then attempts to restart the process, the logs from this are here:
https://gist.github.com/anonymous/256bd6e886421840d151890e0205766d

It looks to me like it goes through the replay -> reconnect -> rejoin ->
active process successfully and then immediately crashes with the same
error after becoming active. Upstart continues to try restarting until it
hits the max number of attempts. At that point the standby takes over and
goes through the same loop. Restarting manually gave the same issue on both
hosts. This process continued for several cycles before I rebooted the
physical host for the MDS process. At that point it started successfully
without issue. After rebooting the standby host it too was able to start
successfully.

Looking at metrics for the MDS host and the ceph cluster in general there
is nothing out of place or abnormal. CPU, memory, network, disk were all
within normal bounds. Other than the MDS processes failing the cluster was
healthy, no slow requests or failed OSDs.

Any thoughts on what might be causing this issue? Is there any further
information I can provide to help debug this?

Thanks in advance.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD crashes on EC recovery

2016-08-10 Thread Roeland Mertens

Hi,

we run a Ceph 10.2.1 cluster across 35 nodes with a total of 595 OSDs, 
we have a mixture of normally replicated volumes and EC volumes using 
the following erasure-code-profile:


# ceph osd erasure-code-profile get rsk8m5
jerasure-per-chunk-alignment=false
k=8
m=5
plugin=jerasure
ruleset-failure-domain=host
ruleset-root=default
technique=reed_sol_van
w=8

Now we had a disk failure and on swap out we seem to have encountered a 
bug where during recovery OSDs crash when trying to fix certain pgs that 
may have been corrupted.


For example:
   -3> 2016-08-10 12:38:21.302938 7f893e2d7700  5 -- op tracker -- seq: 
3434, time: 2016-08-10 12:38:21.302938, event: queued_for_pg, op: 
MOSDECSubOpReadReply(63.1a18s0 47661 ECSubReadReply(tid=1, attrs_read=0))
-2> 2016-08-10 12:38:21.302981 7f89bef50700  1 -- 
10.93.105.11:6831/2674119 --> 10.93.105.22:6802/357033 -- 
osd_map(47662..47663 src has 32224..47663) v3 -- ?+0 0x559c1057f3c0 con 
0x559c0664a700
-1> 2016-08-10 12:38:21.302996 7f89bef50700  5 -- op tracker -- 
seq: 3434, time: 2016-08-10 12:38:21.302996, event: reached_pg, op: 
MOSDECSubOpReadReply(63.1a18s0 47661 ECSubReadReply(tid=1, attrs_read=0))
 0> 2016-08-10 12:38:21.306193 7f89bef50700 -1 osd/ECBackend.cc: In 
function 'virtual void 
OnRecoveryReadComplete::finish(std::pair&)' thread 7f89bef50700 time 2016-08-10 
12:38:21.303012

osd/ECBackend.cc: 203: FAILED assert(res.errors.empty())

then the ceph-osd daemon goes splat. I've attached an extract of a 
logfile showing a bit more.


Anyone have any ideas? I'm stuck now with a pg that's stuck as 
down+remapped+peering. ceph pg query tells me that peering is blocked to 
the loss of an osd, though restarting it just results in another crash 
of the ceph-osd daemon. We tried to force a rebuild by using 
ceph-objectstore-tool to delete the pg segment on some of the OSDs that 
are crashing but that didn't help one iota.


Any help would be greatly appreciated,

regards,

Roeland

--
This email is sent on behalf of Genomics plc, a public limited company 
registered in England and Wales with registered number 8839972, VAT 
registered number 189 2635 65 and registered office at King Charles House, 
Park End Street, Oxford, OX1 1JD, United Kingdom.
The contents of this e-mail and any attachments are confidential to the 
intended recipient. If you are not the intended recipient please do not use 
or publish its contents, contact Genomics plc immediately at 
i...@genomicsplc.com  then delete. You may not copy, 
forward, use or disclose the contents of this email to anybody else if you 
are not the intended recipient. Emails are not secure and may contain 
viruses.
-4> 2016-08-10 12:38:21.302910 7f893e2d7700  1 -- 10.93.105.11:6831/2674119 
<== osd.290 10.93.105.22:6802/357033 42  MOSDECSubOpReadReply(63.1a18s0 
47661 ECSubReadReply(tid=1, attrs_read=0)) v1  170+0+0 (1521384358 0 0) 
0x559bf
b611400 con 0x559c0664a700
-3> 2016-08-10 12:38:21.302938 7f893e2d7700  5 -- op tracker -- seq: 3434, 
time: 2016-08-10 12:38:21.302938, event: queued_for_pg, op: 
MOSDECSubOpReadReply(63.1a18s0 47661 ECSubReadReply(tid=1, attrs_read=0))
-2> 2016-08-10 12:38:21.302981 7f89bef50700  1 -- 10.93.105.11:6831/2674119 
--> 10.93.105.22:6802/357033 -- osd_map(47662..47663 src has 32224..47663) v3 
-- ?+0 0x559c1057f3c0 con 0x559c0664a700
-1> 2016-08-10 12:38:21.302996 7f89bef50700  5 -- op tracker -- seq: 3434, 
time: 2016-08-10 12:38:21.302996, event: reached_pg, op: 
MOSDECSubOpReadReply(63.1a18s0 47661 ECSubReadReply(tid=1, attrs_read=0))
 0> 2016-08-10 12:38:21.306193 7f89bef50700 -1 osd/ECBackend.cc: In 
function 'virtual void 
OnRecoveryReadComplete::finish(std::pair&)' thread 7f89bef50700 time 2016-08-10 
12:38:21.303012
osd/ECBackend.cc: 203: FAILED assert(res.errors.empty())

 ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) 
[0x559be1135e2b]
 2: (OnRecoveryReadComplete::finish(std::pair&)+0x192) [0x559be0cf6122]
 3: (GenContext&>::complete(std::pair&)+0x9) [0x559be0ce3b89]
 4: (ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x63) 
[0x559be0cda003]
 5: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, 
RecoveryMessages*)+0xf68) [0x559be0cdafd8]
 6: (ECBackend::handle_message(std::shared_ptr)+0x186) 
[0x559be0ce2236]
 7: (ReplicatedPG::do_request(std::shared_ptr&, 
ThreadPool::TPHandle&)+0xed) [0x559be0c1c30d]
 8: (OSD::dequeue_op(boost::intrusive_ptr, std::shared_ptr, 
ThreadPool::TPHandle&)+0x3f5) [0x559be0adb285]
 9: (PGQueueable::RunVis::operator()(std::shared_ptr&)+0x5d) 
[0x559be0adb4ad]
 10: (OSD::ShardedOpWQ::_process(unsigned int, 

Re: [ceph-users] Recover Data from Deleted RBD Volume

2016-08-10 Thread Ilya Dryomov
On Mon, Aug 8, 2016 at 11:47 PM, Jason Dillaman  wrote:
> On Mon, Aug 8, 2016 at 5:39 PM, Jason Dillaman  wrote:
>> Unfortunately, for v2 RBD images, this image name to image id mapping
>> is stored in the LevelDB database within the OSDs and I don't know,
>> offhand, how to attempt to recover deleted values from there.
>
> Actually, to correct myself, the "rbd_id." object just

I think Jason meant to write "rbd_id." here.

> writes the image id in binary to the file. So if you can recover that
> file and retrieve its contents, you can again determine the block name
> prefix in the form of "rbd_data..".

So if you had a v2 image named "myimage" in a pool named "rbd"

$ ceph osd map rbd rbd_id.myimage
osdmap e11 pool 'rbd' (0) object 'rbd_id.myimage' -> pg 0.f8d9dc15
(0.5) -> up ([0,2,1], p0) acting ([0,2,1], p0)

you'd need to search for a file whose name starts with
"rbd\uid.myimage" under /current/0.5_head/.
The 0.5 is the short PG id from the ceph osd map command above (the
object doesn't have to exist for it to work).  The "\u" is literally
a "\" followed by a "u" - ceph's FileStore uses underscores as
separators so underscores in object names get translated to "\u" in the
corresponding file names.  The actual file name is going to be
something along the lines of "rbd\uid.myimage__head_F8D9DC15__0":

$ xxd "./0.5_head/rbd\uid.myimage__head_F8D9DC15__0"
: 0c00  3130 3130 3734 6230 6463 3531  101074b0dc51

That's the prefix for the image.  myimage actually exists here, so
I can verify it with:

$ rbd info rbd/myimage | grep block_name_prefix
block_name_prefix: rbd_data.101074b0dc51

With the prefix at hand, you'd need to search all /current/
directories for files whose names start with "rbd\udata.101074b0dc51",
doing the equivalent of:

$ find . -regex ".*/rbd\\\udata.101074b0dc51.*"
./0.4_head/rbd\udata.101074b0dc51.0003__head_64B130D4__0
./0.0_head/rbd\udata.101074b0dc51.__head_7A694010__0
./0.3_head/rbd\udata.101074b0dc51.0004__head_85FCAA2B__0
./0.1_head/rbd\udata.101074b0dc51.0002__head_660B5009__0
./0.6_head/rbd\udata.101074b0dc51.0001__head_33B916C6__0
...


There is a rbd-recover-tool tool in the ceph source tree, which can
reconstruct rbd images from a FileStore structure outlined in this
thread.  I'm not sure if we document it or even build it (probably not,
and it won't be of much use to you anyway since the files are gone),
but you can peruse the code for the exact object name regexes.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recover Data from Deleted RBD Volume

2016-08-10 Thread Jason Dillaman
The image's associated metadata is removed from the directory once the
image is removed.  Also, the default librbd log level will not log an
image's internal id.  Therefore, unfortunately, the only way to
proceed is how I previously described.

On Wed, Aug 10, 2016 at 2:48 AM, Brad Hubbard  wrote:
>
>
> On Wed, Aug 10, 2016 at 3:16 PM, Georgios Dimitrakakis  
> wrote:
>>
>> Hello!
>>
>> Brad,
>>
>> is that possible from the default logging or verbose one is needed??
>>
>> I 've managed to get the UUID of the deleted volume from OpenStack but don't
>> really know how to get the offsets and OSD maps since "rbd info" doesn't
>> provide any information for that volume.
>
> Did you grep for the UUID (might be safer to grep for the first 8 chars or
> so since I'm not 100% sure of the format) in the logs?
>
> There is also a RADOS object called the rbd directory that contains some
> mapping information for rbd images but I don't know if this is erased when an
> image is deleted, nor how to look at it but someone more adept at RBD may be
> able to make suggestions how to confirm this?
>
> HTH,
> Brad
>
>>
>> Is it possible to somehow get them from leveldb?
>>
>> Best,
>>
>> G.
>>
>>
>>> On Tue, Aug 9, 2016 at 7:39 AM, George Mihaiescu
>>>  wrote:

 Look in the cinder db, the volumes table to find the Uuid of the deleted
 volume.
>>>
>>>
>>> You could also look through the logs at the time of the delete and I
>>> suspect you should
>>> be able to see how the rbd image was prefixed/named at the time of
>>> the delete.
>>>
>>> HTH,
>>> Brad
>>>

 If you go through yours OSDs and look for the directories for PG index
 20, you might find some fragments from the deleted volume, but it's a long
 shot...

> On Aug 8, 2016, at 4:39 PM, Georgios Dimitrakakis 
> wrote:
>
> Dear David (and all),
>
> the data are considered very critical therefore all this attempt to
> recover them.
>
> Although the cluster hasn't been fully stopped all users actions have. I
> mean services are running but users are not able to read/write/delete.
>
> The deleted image was the exact same size of the example (500GB) but it
> wasn't the only one deleted today. Our user was trying to do a "massive"
> cleanup by deleting 11 volumes and unfortunately one of them was very
> important.
>
> Let's assume that I "dd" all the drives what further actions should I do
> to recover the files? Could you please elaborate a bit more on the phrase
> "If you've never deleted any other rbd images and assuming you can recover
> data with names, you may be able to find the rbd objects"??
>
> Do you mean that if I know the file names I can go through and check for
> them? How?
> Do I have to know *all* file names or by searching for a few of them I
> can find all data that exist?
>
> Thanks a lot for taking the time to answer my questions!
>
> All the best,
>
> G.
>
>> I dont think theres a way of getting the prefix from the cluster at
>> this point.
>>
>> If the deleted image was a similar size to the example youve given,
>> you will likely have had objects on every OSD. If this data is
>> absolutely critical you need to stop your cluster immediately or make
>> copies of all the drives with something like dd. If youve never
>> deleted any other rbd images and assuming you can recover data with
>> names, you may be able to find the rbd objects.
>>
>> On Mon, Aug 8, 2016 at 7:28 PM, Georgios Dimitrakakis  wrote:
>>
> Hi,
>
> On 08.08.2016 10:50, Georgios Dimitrakakis wrote:
>
>>> Hi,
>>>
 On 08.08.2016 09:58, Georgios Dimitrakakis wrote:

 Dear all,

 I would like your help with an emergency issue but first
 let me describe our environment.

 Our environment consists of 2OSD nodes with 10x 2TB HDDs
 each and 3MON nodes (2 of them are the OSD nodes as well)
 all with ceph version 0.80.9
 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)

 This environment provides RBD volumes to an OpenStack
 Icehouse installation.

 Although not a state of the art environment is working
 well and within our expectations.

 The issue now is that one of our users accidentally
 deleted one of the volumes without keeping its data first!

 Is there any way (since the data are considered critical
 and very important) to recover them from CEPH?
>>>
>>>
>>> Short answer: no
>>>
>>> Long answer: no, but
>>>
>>> Consider the way Ceph 

[ceph-users] ceph recreate the already exist bucket throw out error when have max_buckets num bucket

2016-08-10 Thread Leo Yu
hi,i create a user uid=testquato2 ,the user  can create  max_buckets num =10
 buckets,
[root@node1 ~]# radosgw-admin user info --uid=testquato2
{
"user_id": "testquato2",
"display_name": "testquato2",
"email": "",
"suspended": 0,
"max_buckets": 10,
"auid": 0,
"subusers": [],
"keys": [
{
"user": "testquato2",
"access_key": "quta2",
"secret_key": "quta2"
}
],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
},
"temp_url_keys": []
}

if there are 8 buckets already exist and one bucket named quta2-b1
when i recreate the bucket quta2-b1 with the command

s3cmd -c testquato2.s3cfg  mb s3://quta2-b1

and return the exist bucket

but if there are 10 buckets already exist and one bucket named quta2-b1
when i recreate the bucket quta2-b1 with the command

s3cmd -c testquato2.s3cfg  mb s3://quta2-b1

but return
ERROR: S3 error: 400 (TooManyBuckets)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fast Ceph a Cluster with PB storage

2016-08-10 Thread Александр Пивушков

>
>2016-08-10 9:30 GMT+05:00 Александр Пивушков  < p...@mail.ru > :
>>I want to use Ceph only as  user data storage.
>>user program writes data to a folder that is mounted on a Ceph.
>>Virtual machine images are not stored on the Сeph.
>>Fiber channel and 40GbE  are used only for the rapid transmission of 
>>information between the cluster Ceph and the virtual machine on oVirt.
>>In this scheme I can use oVirt?
>What kind of OSes do you use on the guests? If it's linux then it's better to 
>directly use RBD (in case of per VM dedicated storage) or CephFS (if the 
>storage has to be shared) right inside of the guest, if it's windows - export 
>CephFS with Samba. And I believe you definetely want to use 40GbE or 
>Infiniband instead of FC.
guests - Windows 
I found the project
https://github.com/ketor/ceph-dokan  

I need to be used CephFS and therefore MDS will need...


--
Александр Пивушков.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best practices for extending a ceph cluster with minimal client impact data movement

2016-08-10 Thread Wido den Hollander

> Op 9 augustus 2016 om 17:44 schreef Martin Palma :
> 
> 
> Hi Wido,
> 
> thanks for your advice.
> 

Just keep in mind, you should update the CRUSHMap in one big bang. The cluster 
will be calculating and peering for 1 or 2 min and afterwards you should see 
all PGs active+X.

Then the waiting game starts, get coffee, some sleep and wait for it to finish.

By throttling recovery you prevent this to become slow for the clients.

Wido

> Best,
> Martin
> 
> On Tue, Aug 9, 2016 at 10:05 AM, Wido den Hollander  wrote:
> >
> >> Op 8 augustus 2016 om 16:45 schreef Martin Palma :
> >>
> >>
> >> Hi all,
> >>
> >> we are in the process of expanding our cluster and I would like to
> >> know if there are some best practices in doing so.
> >>
> >> Our current cluster is composted as follows:
> >> - 195 OSDs (14 Storage Nodes)
> >> - 3 Monitors
> >> - Total capacity 620 TB
> >> - Used 360 TB
> >>
> >> We will expand the cluster by other 14 Storage Nodes and 2 Monitor
> >> nodes. So we are doubling the current deployment:
> >>
> >> - OSDs: 195 --> 390
> >> - Total capacity: 620 TB --> 1250 TB
> >>
> >> During the expansion we would like to minimize the client impact and
> >> data movement. Any suggestions?
> >>
> >
> > There are a few routes you can take, I would suggest that you:
> >
> > - set max backfills to 1
> > - set max recovery to 1
> >
> > Now, add the OSDs to the cluster, but NOT to the CRUSHMap.
> >
> > When all the OSDs are online, inject a new CRUSHMap where you add the new 
> > OSDs to the data placement.
> >
> > $ ceph osd setcrushmap -i 
> >
> > The OSDs will now start to migrate data, but this is throttled by the max 
> > recovery and backfill settings.
> >
> > Wido
> >
> >> Best,
> >> Martin
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fast Ceph a Cluster with PB storage

2016-08-10 Thread Александр Пивушков
 Hello


>Вторник,  9 августа 2016, 14:56 +03:00 от Christian Balzer :
>
>
>Hello,
>
>[re-added the list]
>
>Also try to leave a line-break, paragraph between quoted and new text,
>your mail looked like it was all written by me...
>
>On Tue, 09 Aug 2016 11:00:27 +0300 Александр Пивушков wrote:
>
>>  Thank you for your response!
>> 
>> 
>> >Вторник,  9 августа 2016, 5:11 +03:00 от Christian Balzer < ch...@gol.com >:
>> >
>> >
>> >Hello,
>> >
>> >On Mon, 08 Aug 2016 17:39:07 +0300 Александр Пивушков wrote:
>> >
>> >> 
>> >> Hello dear community!
>> >> I'm new to the Ceph and not long ago took up the theme of building 
>> >> clusters.
>> >> Therefore it is very important to your opinion.
>> >> It is necessary to create a cluster from 1.2 PB storage and very rapid 
>> >> access to data. Earlier disks of "Intel® SSD DC P3608 Series 1.6TB NVMe 
>> >> PCIe 3.0 x4 Solid State Drive" were used, their speed of all satisfies, 
>> >> but with increase of volume of storage, the price of such cluster very 
>> >> strongly grows and therefore there was an idea to use Ceph.
>> >
>> >You may want to tell us more about your environment, use case and in
>> >particular what your clients are.
>> >Large amounts of data usually means graphical or scientific data,
>> >extremely high speed (IOPS) requirements usually mean database
>> >like applications, which one is it, or is it a mix? 
>>
>>This is a mixed project, with combined graphics and science. Project linking 
>>the vast array of image data. Like google MAP :)
>> Previously, customers were Windows that are connected to powerful servers 
>> directly. 
>> Ceph cluster connected on FC to servers of the virtual machines is now 
>> planned. Virtualization - oVirt. 
>
>Stop right there. oVirt, despite being from RedHat, doesn't really support
>Ceph directly all that well, last I checked.
>That is probably where you get the idea/need for FC from.
>
>If anyhow possible, you do NOT want another layer and protocol conversion
>between Ceph and the VMs, like a FC gateway or iSCSI or NFS.
>
>So if you're free to choose your Virtualization platform, use KVM/qemu at
>the bottom and something like Openstack, OpenNebula, ganeti, Pacemake with
>KVM resource agents on top.

I have worked with proxmox

>
>
>>Clients on 40 GB ethernet are connected to servers of virtualization.
>
>Your VM clients (if using RBD instead of FC) and the end-users could use
>the same network infrastructure.
>
>>Clients on Windows.
>> Customers use their software. It is written by them. About the base I do not 
>> know, probably not. The processing results are stored in conventional files. 
>> In total about 160 GB.
>
>1 image file being 160GB?
No, a lot of files of different sizes. From 1 GB to 1 MB

>
>
>> We need very quickly to process these images, so as not to cause 
>> dissatisfaction among customers. :) Per minute.
>
>Explain. 
>Writing 160GB/minute is going to be a challenge on many levels.
>Even with 40Gb/s networks this assumes no contention on the network OR the
>storage backend...
For Fiber channel
speed -16 GB / s  size  -160 GB obtained 
160*8/16=80 second 1  canal (theoretical speed)

>
>
>
>> >
>> >
>> >For example, how were the above NVMes deployed and how did they serve data
>> >to the clients?
>> >The fiber channel bit in your HW list below makes me think you're using
>> >VMware, FC and/or iSCSI right now. 
>>
>>Data is stored on the SSD disk 1.6TB NVMe, and processed and stored directly 
>>on it. In one powerful server. Gave for this task. Used 40 GB ethernet. 
>>Server - CentOS 7 
>
>So you're going from a single server with all NVMe storage to a
>distributed storage. 
>
>You will be disappointed by the cost/performance in direct comparison.
Nevertheless, you need as many users. They need to provide a single repository 
for their data.

>
>
>
>> 
>> >
>> >
>> >> There are following requirements:
>> >> - The amount of data 160 GB should be read and written at speeds of SSD 
>> >> P3608
>> >Again, how are they serving data now?
>> >The speeds (latency!) a local NVMe can reach is of course impossible with
>> >a network attached SDS like Ceph. 
>>
>>It is sad. Not helping matters is paralleling to 13 servers? and the FC?
>>
>Ceph does not FC internally.
>I only uses IP (so you can use IPoIB if you want).
>Never mind that the problem is that the replication (x3) is causing the
>largest part of the latency.
can be customized to replication occurs in the background?

>
>
>> >
>> >160GB is tiny, are you sure about this number? 
>>
>>Yes, it's small, and it is exactly. But, it is the most sensitive data 
>>processing time. Even in the background and can be a slower process more 
>>data. Their treatment is not so nervous clients.
>
>Still no getting it, but it seems more and more like 160GB/s.
no, 160 GB.

>
>
>> >
>> >
>> >> - There must be created a high-speed storage of the SSD drives 36 TB 
>> >> volume with read / write speed tends to SSD P3608
>> >How is that different to the point 

Re: [ceph-users] Recover Data from Deleted RBD Volume

2016-08-10 Thread Brad Hubbard


On Wed, Aug 10, 2016 at 3:16 PM, Georgios Dimitrakakis  
wrote:
>
> Hello!
>
> Brad,
>
> is that possible from the default logging or verbose one is needed??
>
> I 've managed to get the UUID of the deleted volume from OpenStack but don't
> really know how to get the offsets and OSD maps since "rbd info" doesn't
> provide any information for that volume.

Did you grep for the UUID (might be safer to grep for the first 8 chars or
so since I'm not 100% sure of the format) in the logs?

There is also a RADOS object called the rbd directory that contains some
mapping information for rbd images but I don't know if this is erased when an
image is deleted, nor how to look at it but someone more adept at RBD may be
able to make suggestions how to confirm this?

HTH,
Brad

>
> Is it possible to somehow get them from leveldb?
>
> Best,
>
> G.
>
>
>> On Tue, Aug 9, 2016 at 7:39 AM, George Mihaiescu
>>  wrote:
>>>
>>> Look in the cinder db, the volumes table to find the Uuid of the deleted
>>> volume.
>>
>>
>> You could also look through the logs at the time of the delete and I
>> suspect you should
>> be able to see how the rbd image was prefixed/named at the time of
>> the delete.
>>
>> HTH,
>> Brad
>>
>>>
>>> If you go through yours OSDs and look for the directories for PG index
>>> 20, you might find some fragments from the deleted volume, but it's a long
>>> shot...
>>>
 On Aug 8, 2016, at 4:39 PM, Georgios Dimitrakakis 
 wrote:

 Dear David (and all),

 the data are considered very critical therefore all this attempt to
 recover them.

 Although the cluster hasn't been fully stopped all users actions have. I
 mean services are running but users are not able to read/write/delete.

 The deleted image was the exact same size of the example (500GB) but it
 wasn't the only one deleted today. Our user was trying to do a "massive"
 cleanup by deleting 11 volumes and unfortunately one of them was very
 important.

 Let's assume that I "dd" all the drives what further actions should I do
 to recover the files? Could you please elaborate a bit more on the phrase
 "If you've never deleted any other rbd images and assuming you can recover
 data with names, you may be able to find the rbd objects"??

 Do you mean that if I know the file names I can go through and check for
 them? How?
 Do I have to know *all* file names or by searching for a few of them I
 can find all data that exist?

 Thanks a lot for taking the time to answer my questions!

 All the best,

 G.

> I dont think theres a way of getting the prefix from the cluster at
> this point.
>
> If the deleted image was a similar size to the example youve given,
> you will likely have had objects on every OSD. If this data is
> absolutely critical you need to stop your cluster immediately or make
> copies of all the drives with something like dd. If youve never
> deleted any other rbd images and assuming you can recover data with
> names, you may be able to find the rbd objects.
>
> On Mon, Aug 8, 2016 at 7:28 PM, Georgios Dimitrakakis  wrote:
>
 Hi,

 On 08.08.2016 10:50, Georgios Dimitrakakis wrote:

>> Hi,
>>
>>> On 08.08.2016 09:58, Georgios Dimitrakakis wrote:
>>>
>>> Dear all,
>>>
>>> I would like your help with an emergency issue but first
>>> let me describe our environment.
>>>
>>> Our environment consists of 2OSD nodes with 10x 2TB HDDs
>>> each and 3MON nodes (2 of them are the OSD nodes as well)
>>> all with ceph version 0.80.9
>>> (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)
>>>
>>> This environment provides RBD volumes to an OpenStack
>>> Icehouse installation.
>>>
>>> Although not a state of the art environment is working
>>> well and within our expectations.
>>>
>>> The issue now is that one of our users accidentally
>>> deleted one of the volumes without keeping its data first!
>>>
>>> Is there any way (since the data are considered critical
>>> and very important) to recover them from CEPH?
>>
>>
>> Short answer: no
>>
>> Long answer: no, but
>>
>> Consider the way Ceph stores data... each RBD is striped
>> into chunks
>> (RADOS objects with 4MB size by default); the chunks are
>> distributed
>> among the OSDs with the configured number of replicates
>> (probably two
>> in your case since you use 2 OSD hosts). RBD uses thin
>> provisioning,
>> so chunks are allocated upon first write access.
>> If an RBD is deleted all of its chunks are deleted on the