[ceph-users] Infernalis OSD errored out on journal permissions without mentioning anything in its log

2016-03-31 Thread Bill Sharer
This took a little head scratching until I figured out why my osd 
daemons were  not restarting under Infernalis on Gentoo.


I had just upgraded from Hammer to Infernalis and had reset ownership 
from root:root to ceph:ceph on the files of each OSD in 
/var/lib/ceph/osd/ceph-n.  However I forgot to take into account the 
ownership on the journals which I have set up as raw partitions. Under 
Gentoo, I needed to put the ceph user into the "disk" group to allow it 
to have write access to the device files.


The osd startup init script started the osd with ok status but the 
actual process would exit without writing anything to its 
/var/log/ceph/ceph-osd.n.log.  I would have thought there might have 
been some sort of permission error logged, but nope :-)


Bill Sharer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error from monitor

2016-03-31 Thread zainal
Yesterday, no error. 

 

[root@mon01 ceph]# service ceph-mon@mon01.service status

Redirecting to /bin/systemctl status  ceph-mon@mon01.service.service

ceph-mon@mon01.service.service - Ceph cluster monitor daemon

   Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled)

   Active: failed (Result: exit-code) since Fri 2016-04-01 10:51:17 MYT;
19min ago

  Process: 2394 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id %i
--setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)

Main PID: 2394 (code=exited, status=1/FAILURE)

 

Apr 01 10:51:17 mon01.nocser.net systemd[1]: Starting Ceph cluster monitor
daemon...

Apr 01 10:51:17 mon01.nocser.net systemd[1]: Started Ceph cluster monitor
daemon.

Apr 01 10:51:17 mon01.nocser.net ceph-mon[2394]: monitor data directory at
'/var/lib/ceph/mon/ceph-mon01.service' does not exist: have you run 'mkfs'?

Apr 01 10:51:17 mon01.nocser.net systemd[1]: ceph-mon@mon01.service.service:
main process exited, code=exited, status=1/FAILURE

Apr 01 10:51:17 mon01.nocser.net systemd[1]: Unit
ceph-mon@mon01.service.service entered failed state.

 

Regards,

 

Mohd Zainal Abidin Rabani

Technical Support

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
zai...@nocser.net
Sent: 01 April 2016 10:59
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Error from monitor

 

Hi,

 

I keep getting this error:

 

2016-04-01 10:55:59.666015 7fe7301ee700  0 -- :/3560986127 >>
42.0.30.39:6789/0 pipe(0x7fe724000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7fe724004ef0).fault

2016-04-01 10:56:08.667082 7fe73aa67700  0 -- 42.0.30.38:0/3560986127 >>
42.0.30.39:6789/0 pipe(0x7fe724000c00 sd=3 :0 s=1 pgs=0 cs=0 l=1
c=0x7fe724005160).fault

 

42.0.30.38 >> mon01

42.0.30.39 >> mon02

 

What start sequence for ceph? How to start/stop monitor?

 

My setup:

 

Mon01

Mon02

Mon03

Osd01 

Osd02

Osd03

Osd04

Osd05

 

Regards,

 

Mohd Zainal Abidin Rabani

Technical Support

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Error from monitor

2016-03-31 Thread zainal
Hi,

 

I keep getting this error:

 

2016-04-01 10:55:59.666015 7fe7301ee700  0 -- :/3560986127 >>
42.0.30.39:6789/0 pipe(0x7fe724000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7fe724004ef0).fault

2016-04-01 10:56:08.667082 7fe73aa67700  0 -- 42.0.30.38:0/3560986127 >>
42.0.30.39:6789/0 pipe(0x7fe724000c00 sd=3 :0 s=1 pgs=0 cs=0 l=1
c=0x7fe724005160).fault

 

42.0.30.38 >> mon01

42.0.30.39 >> mon02

 

What start sequence for ceph? How to start/stop monitor?

 

My setup:

 

Mon01

Mon02

Mon03

Osd01 

Osd02

Osd03

Osd04

Osd05

 

Regards,

 

Mohd Zainal Abidin Rabani

Technical Support

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD crash after conversion to bluestore

2016-03-31 Thread Adrian Saul

No - if you use ceph-disk prepare it creates a small filesystem with some 
control files, the bluestore partition is not visible.


> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Oliver Dzombic
> Sent: Friday, 1 April 2016 12:08 AM
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] OSD crash after conversion to bluestore
>
> Hi,
>
> if i understand it correct, bluestore wont use / is not a filesystem to be
> mounted.
>
> So if an osd is up and in, while we dont see its mounted into the filesystem
> and accessable, we could assume that it must be powered by bluestore...
> !??!
>
> --
> Mit freundlichen Gruessen / Best regards
>
> Oliver Dzombic
> IP-Interactive
>
> mailto:i...@ip-interactive.de
>
> Anschrift:
>
> IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3
> 63571 Gelnhausen
>
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
>
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
>
>
> Am 31.03.2016 um 14:47 schrieb German Anders:
> > having jewel install, is possible to run a command in order to see
> > that the OSD is actually using bluestore?
> >
> > Thanks in advance,
> >
> > Best,
> >
> >
> > **
> >
> > *German*
> >
> > 2016-03-31 1:24 GMT-03:00 Adrian Saul  > >:
> >
> >
> > I upgraded my lab cluster to 10.1.0 specifically to test out
> > bluestore and see what latency difference it makes.
> >
> > I was able to one by one zap and recreate my OSDs to bluestore and
> > rebalance the cluster (the change to having new OSDs start with low
> > weight threw me at first, but once  I worked that out it was fine).
> >
> > I was all good until I completed the last OSD, and then one of the
> > earlier ones fell over and refuses to restart.  Every attempt to
> > start fails with this assertion failure:
> >
> > -2> 2016-03-31 15:15:08.868588 7f931e5f0800  0 
> > cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan
> > -1> 2016-03-31 15:15:08.868800 7f931e5f0800  1 
> > cls/timeindex/cls_timeindex.cc:259: Loaded timeindex class!
> >  0> 2016-03-31 15:15:08.870948 7f931e5f0800 -1 osd/OSD.h: In
> > function 'OSDMapRef OSDService::get_map(epoch_t)' thread
> > 7f931e5f0800 time 2016-03-31 15:15:08.869638
> > osd/OSD.h: 886: FAILED assert(ret)
> >
> >  ceph version 10.1.0 (96ae8bd25f31862dbd5302f304ebf8bf1166aba6)
> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x85) [0x558cee37da55]
> >  2: (OSDService::get_map(unsigned int)+0x3d) [0x558cedd6a6fd]
> >  3: (OSD::init()+0xf22) [0x558cedd1d172]
> >  4: (main()+0x2aab) [0x558cedc83a2b]
> >  5: (__libc_start_main()+0xf5) [0x7f931b506b15]
> >  6: (()+0x349689) [0x558cedccd689]
> >  NOTE: a copy of the executable, or `objdump -rdS ` is
> > needed to interpret this.
> >
> >
> > I could just zap and recreate it again, but I would be curious to
> > know how to fix it, or unless someone can suggest if this is a bug
> > that needs looking at.
> >
> > Cheers,
> >  Adrian
> >
> >
> > Confidentiality: This email and any attachments are confidential and
> > may be subject to copyright, legal or some other professional
> > privilege. They are intended solely for the attention and use of the
> > named addressee(s). They may only be copied, distributed or
> > disclosed with the consent of the copyright owner. If you have
> > received this email by mistake or by breach of the confidentiality
> > clause, please notify the sender immediately by return email and
> > delete or destroy all copies of the email. Any confidentiality,
> > privilege or copyright is not waived or lost because this email has
> > been sent to you by mistake.
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or l

Re: [ceph-users] ceph pg query hangs for ever

2016-03-31 Thread Goncalo Borges

Hi Mart, Wido...

A disclaimer: Not really an expert, just a regular site admin sharing my 
experience.


At the beginning of the thread you give the idea that only osd.68 has 
problems dealing with the problematic PG 3.117. If that is indeed the 
case, you could simply mark that osd.68 down and remove it from the 
cluster. This will trigger Ceph to replicate all PGs in osd.68 to other 
osds based on other PG replicas.


However, In the last email, you seem to give the idea that it is the PG 
3.117 which has problems, which makes all osds which share that PG also 
problematic.  Because of that you marked all osds sharing that PG as down.



Before actually trying something more drastic, I would go for a more 
classic approach. For example, what happens if you turn only one osd up? 
I would start with osd.74 since you suspect of problems in osd.68 and 
osd.55 was the reason for the dump message bellow. If it still aborts 
than it means that the PG might have been replicated everywhere with 
'bad' data.


The drastic approach (If you do not care about data on that PG), is to 
mark those osds has down, and force that PG to be recreated using 'ceph 
pg force_create_pg 3.117'. Based on my previous experience, once I've 
recreated a PG, 'ceph pg dump_stuck stale' showed that PG is creating 
state forever. To make it right, I had to restart the proper osds. But, 
as you stated, you then have to deal with data corruption at the VMs 
level... Maybe that is a problem, maybe it isn't...


Hope that helps
Cheers
Goncalo




On 03/31/2016 12:26 PM, Mart van Santen wrote:



Hello,

Well unfortunately the problem is not really solved. Yes, we managed 
to get to a good health state at some point, when a client hits some 
specific data, the osd process crashes with below errors. The 3 OSD 
which handle 3.117, the PG with problems, are currently down and 
reweighted them to 0, so non-affected PGs are currently rebuild on 
other OSDs

If I put them crashed osd up, the do crash again within a few minutes.

As I'm a bit afraid for the data in this PG, I think we want to 
recreate the PG with empty data and discard the old disks. I 
understand I will get datacorruption on serveral RBDs in this case, 
but we will try to solve that and rebuild the affected VMs. Does this 
makes sense and what are the best next steps?


Regards,

Mart

works




   -34> 2016-03-31 03:07:56.932800 7f8e43829700  3 osd.55 122203 
handle_osd_map epochs [122203,122203], i have 122203, src has 
[120245,122203]
   -33> 2016-03-31 03:07:56.932837 7f8e43829700  1 -- 
[2a00:c6c0:0:122::105]:6822/11703 <== osd.45 
[2a00:c6c0:0:122::103]:6800/1852 7  pg_info(1 pgs e122202:3.117) 
v4  919+0+0 (3389909573 0 0) 0x528bc00 con 0x1200a840
   -32> 2016-03-31 03:07:56.932855 7f8e43829700  5 -- op tracker -- 
seq: 22, time: 2016-03-31 03:07:56.932770, event: header_read, op: 
pg_info(1 pgs e122202:3.117)
   -31> 2016-03-31 03:07:56.932869 7f8e43829700  5 -- op tracker -- 
seq: 22, time: 2016-03-31 03:07:56.932771, event: throttled, op: 
pg_info(1 pgs e122202:3.117)
   -30> 2016-03-31 03:07:56.932878 7f8e43829700  5 -- op tracker -- 
seq: 22, time: 2016-03-31 03:07:56.932822, event: all_read, op: 
pg_info(1 pgs e122202:3.117)
   -29> 2016-03-31 03:07:56.932886 7f8e43829700  5 -- op tracker -- 
seq: 22, time: 2016-03-31 03:07:56.932851, event: dispatched, op: 
pg_info(1 pgs e122202:3.117)
   -28> 2016-03-31 03:07:56.932895 7f8e43829700  5 -- op tracker -- 
seq: 22, time: 2016-03-31 03:07:56.932895, event: waiting_for_osdmap, 
op: pg_info(1 pgs e122202:3.117)
   -27> 2016-03-31 03:07:56.932912 7f8e43829700  5 -- op tracker -- 
seq: 22, time: 2016-03-31 03:07:56.932912, event: started, op: 
pg_info(1 pgs e122202:3.117)
   -26> 2016-03-31 03:07:56.932947 7f8e43829700  5 -- op tracker -- 
seq: 22, time: 2016-03-31 03:07:56.932947, event: done, op: pg_info(1 
pgs e122202:3.117)
   -25> 2016-03-31 03:07:56.933022 7f8e3c01a700  1 -- 
[2a00:c6c0:0:122::105]:6822/11703 --> [2a00:c6c0:0:122::103]:6800/1852 
-- osd_map(122203..122203 src has 121489..122203) v3 -- ?+0 0x11c7fd40 
con 0x1200a840
   -24> 2016-03-31 03:07:56.933041 7f8e3c01a700  1 -- 
[2a00:c6c0:0:122::105]:6822/11703 --> [2a00:c6c0:0:122::103]:6800/1852 
-- pg_info(1 pgs e122203:3.117) v4 -- ?+0 0x528bde0 con 0x1200a840
   -23> 2016-03-31 03:07:56.933111 7f8e3c01a700  1 -- 
[2a00:c6c0:0:122::105]:6822/11703 --> [2a00:c6c0:0:122::105]:6810/3568 
-- osd_map(122203..122203 src has 121489..122203) v3 -- ?+0 0x12200d00 
con 0x1209d4a0
   -22> 2016-03-31 03:07:56.933125 7f8e3c01a700  1 -- 
[2a00:c6c0:0:122::105]:6822/11703 --> [2a00:c6c0:0:122::105]:6810/3568 
-- pg_info(1 pgs e122203:3.117) v4 -- ?+0 0x5288960 con 0x1209d4a0
   -21> 2016-03-31 03:07:56.933154 7f8e3c01a700  1 -- 
[2a00:c6c0:0:122::105]:6822/11703 --> 
[2a00:c6c0:0:122::108]:6816/1002847 -- pg_info(1 pgs e122203:3.117) v4 
-- ?+0 0x5288d20 con 0x101a19c0
   -20> 2016-03-31 03:07:56.933212 7f8e3c01a700  5 osd.55 pg_epoch: 
122203 pg[3.117( v 122193

Re: [ceph-users] OSD crash after conversion to bluestore

2016-03-31 Thread Adrian Saul

Not sure about commands however if you look at the OSD mount point there is  a 
“bluefs” file.


From: German Anders [mailto:gand...@despegar.com]
Sent: Thursday, 31 March 2016 11:48 PM
To: Adrian Saul
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] OSD crash after conversion to bluestore

having jewel install, is possible to run a command in order to see that the OSD 
is actually using bluestore?
Thanks in advance,
Best,

German

2016-03-31 1:24 GMT-03:00 Adrian Saul 
mailto:adrian.s...@tpgtelecom.com.au>>:

I upgraded my lab cluster to 10.1.0 specifically to test out bluestore and see 
what latency difference it makes.

I was able to one by one zap and recreate my OSDs to bluestore and rebalance 
the cluster (the change to having new OSDs start with low weight threw me at 
first, but once  I worked that out it was fine).

I was all good until I completed the last OSD, and then one of the earlier ones 
fell over and refuses to restart.  Every attempt to start fails with this 
assertion failure:

-2> 2016-03-31 15:15:08.868588 7f931e5f0800  0  
cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan
-1> 2016-03-31 15:15:08.868800 7f931e5f0800  1  
cls/timeindex/cls_timeindex.cc:259: Loaded timeindex class!
 0> 2016-03-31 15:15:08.870948 7f931e5f0800 -1 osd/OSD.h: In function 
'OSDMapRef OSDService::get_map(epoch_t)' thread 7f931e5f0800 time 2016-03-31 
15:15:08.869638
osd/OSD.h: 886: FAILED assert(ret)

 ceph version 10.1.0 (96ae8bd25f31862dbd5302f304ebf8bf1166aba6)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) 
[0x558cee37da55]
 2: (OSDService::get_map(unsigned int)+0x3d) [0x558cedd6a6fd]
 3: (OSD::init()+0xf22) [0x558cedd1d172]
 4: (main()+0x2aab) [0x558cedc83a2b]
 5: (__libc_start_main()+0xf5) [0x7f931b506b15]
 6: (()+0x349689) [0x558cedccd689]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.


I could just zap and recreate it again, but I would be curious to know how to 
fix it, or unless someone can suggest if this is a bug that needs looking at.

Cheers,
 Adrian


Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Developer Monthly (CDM)

2016-03-31 Thread Patrick McGarry
Hey cephers,

Just a reminder that the monthly developer standup will be happening
next Wed, 06 Apr @ 12:30p EST. Please add your blueprint to the
appropriate page on the planning section of the wiki. Thanks!

http://tracker.ceph.com/projects/ceph/wiki/Planning

-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Thin Provisioning on OpenStack Instances

2016-03-31 Thread Mario Codeniera
Hi,

Is there anyone done thin provisioning on OpenStack instances (virtual
machine)? Based on the current configurations, it works well with my cloud
using ceph 0.94.5 with SSD journal (from 18mins to around 7 mins for
creating an 40GB instance and not good SSD iops). But what I wanted is the
storage space as it copied the whole image from Glance (40GB) to each newly
created virtual machine, is there any chances that it will copy only the
top changes? somewhat like a vmware-like snapshot, but still the base image
is still there.

Current setup:
xxx --> (uploaded glance image, say Centos 7 with 40GB)

if create an instance,
xxx + yyy  where yyy is the new changes
(40GB + MB/GB changes)


*Plan setup: *
(it will save storage as it will not copy xxx)
*yyy* is only stored on the ceph


As per testing on current cloud, the OpenStack snapshot still copy the
whole image + new changes. Correct me if wrong as still using the Kilo
release (2015.1.1) or maybe it was a misconfiguration? And the more I added
the users, the more OSDs will be added too.

Any insights are highly appreciated.


Thanks,
Mario
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Frozen Client Mounts

2016-03-31 Thread Oliver Dzombic
Hi Diego,

lets start with the basics and please give us the output of

ceph -s
ceph osd df
ceph osd perf

at best before and after you provike the iowait.

Thank you !

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 31.03.2016 um 21:38 schrieb Diego Castro:
> Hello, everyone.
> I have a pretty basic ceph setup running on top of Azure Cloud, (4 mons
> and 10 osd's) for rbd images.
> Everything seems to be working as expected until i put some load on it,
> sometimes it doesn't complete the process (mysql restore for ex.) and
> sometimes it does without any issues.
> 
> 
> Client Kernel: 3.10.0-327.10.1.el7.x86_64
> OSD Kernel: 3.10.0-229.7.2.el7.x86_64
> 
> Ceph: ceph-0.94.5-0.el7.x86_64
> 
> On the client side, i have 100%iowait, a lot of "INFO: task blocked for
> more than 120 seconds"
> On the osd side, i have no evidences of faulty disk or read/write
> latency, but i found the following messages:
> 
> 
> 2016-03-28 17:04:03.425249 7f7329fc5700  0 bad crc in data 641367213 !=
> exp 3107019767
> 2016-03-28 17:04:03.440599 7f7329fc5700  0 -- 10.0.3.9:6800/2272
>  >> 10.0.2.5:0/1998047321
>  pipe(0x13cc4800 sd=54 :6800 s=0 pgs=0
> cs=0 l=0 c=0x13883f40).accept peer addr is really 10.0.2.5:0/1998047321
>  (socket is 10.0.2.5:34702/0
> )
> 2016-03-28 17:04:03.487497 7f7333e6a700  0 -- 10.0.3.9:6800/2272
>  submit_message osd_op_reply(20046
> rb.0.6040.238e1f29.0074 [set-alloc-hint object_size 4194304
> write_size 4194304,write 0~524288] v1753'32512 uv32512 ondisk = 0) v6
> remote, 10.0.2.5:0/1998047321 , failed
> lossy con, dropping message 0x12b539c0
> 2016-03-28 17:04:03.532302 7f733666f700  0 -- 10.0.3.9:6800/2272
>  submit_message osd_op_reply(20047
> rb.0.6040.238e1f29.0074 [set-alloc-hint object_size 4194304
> write_size 4194304,write 524288~524288] v1753'32513 uv32513 ondisk = 0)
> v6 remote, 10.0.2.5:0/1998047321 , failed
> lossy con, dropping message 0x1667bc80
> 2016-03-28 17:04:03.535143 7f7333e6a700  0 -- 10.0.3.9:6800/2272
>  submit_message osd_op_reply(20048
> rb.0.6040.238e1f29.0074 [set-alloc-hint object_size 4194304
> write_size 4194304,write 1048576~524288] v1753'32514 uv32514 ondisk = 0)
> v6 remote, 10.0.2.5:0/1998047321 , failed
> lossy con, dropping message 0x12b56e00
> 
> ---
> Diego Castro / The CloudFather
> GetupCloud.com - Eliminamos a Gravidade
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Frozen Client Mounts

2016-03-31 Thread Diego Castro
Hello, everyone.
I have a pretty basic ceph setup running on top of Azure Cloud, (4 mons and
10 osd's) for rbd images.
Everything seems to be working as expected until i put some load on it,
sometimes it doesn't complete the process (mysql restore for ex.) and
sometimes it does without any issues.


Client Kernel: 3.10.0-327.10.1.el7.x86_64
OSD Kernel: 3.10.0-229.7.2.el7.x86_64

Ceph: ceph-0.94.5-0.el7.x86_64

On the client side, i have 100%iowait, a lot of "INFO: task blocked for
more than 120 seconds"
On the osd side, i have no evidences of faulty disk or read/write latency,
but i found the following messages:


2016-03-28 17:04:03.425249 7f7329fc5700  0 bad crc in data 641367213 != exp
3107019767
2016-03-28 17:04:03.440599 7f7329fc5700  0 -- 10.0.3.9:6800/2272 >>
10.0.2.5:0/1998047321 pipe(0x13cc4800 sd=54 :6800 s=0 pgs=0 cs=0 l=0
c=0x13883f40).accept peer addr is really 10.0.2.5:0/1998047321 (socket is
10.0.2.5:34702/0)
2016-03-28 17:04:03.487497 7f7333e6a700  0 -- 10.0.3.9:6800/2272 submit_message
osd_op_reply(20046 rb.0.6040.238e1f29.0074 [set-alloc-hint
object_size 4194304 write_size 4194304,write 0~524288] v1753'32512 uv32512
ondisk = 0) v6 remote, 10.0.2.5:0/1998047321, failed lossy con, dropping
message 0x12b539c0
2016-03-28 17:04:03.532302 7f733666f700  0 -- 10.0.3.9:6800/2272 submit_message
osd_op_reply(20047 rb.0.6040.238e1f29.0074 [set-alloc-hint
object_size 4194304 write_size 4194304,write 524288~524288] v1753'32513
uv32513 ondisk = 0) v6 remote, 10.0.2.5:0/1998047321, failed lossy con,
dropping message 0x1667bc80
2016-03-28 17:04:03.535143 7f7333e6a700  0 -- 10.0.3.9:6800/2272 submit_message
osd_op_reply(20048 rb.0.6040.238e1f29.0074 [set-alloc-hint
object_size 4194304 write_size 4194304,write 1048576~524288] v1753'32514
uv32514 ondisk = 0) v6 remote, 10.0.2.5:0/1998047321, failed lossy con,
dropping message 0x12b56e00

---
Diego Castro / The CloudFather
GetupCloud.com - Eliminamos a Gravidade
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG Stuck active+undersized+degraded+inconsistent

2016-03-31 Thread Calvin Morrow
On Wed, Mar 30, 2016 at 5:24 PM Christian Balzer  wrote:

> On Wed, 30 Mar 2016 15:50:07 + Calvin Morrow wrote:
>
> > On Wed, Mar 30, 2016 at 1:27 AM Christian Balzer  wrote:
> >
> > >
> > > Hello,
> > >
> > > On Tue, 29 Mar 2016 18:10:33 + Calvin Morrow wrote:
> > >
> > > > Ceph cluster with 60 OSDs, Giant 0.87.2.  One of the OSDs failed due
> > > > to a hardware error, however after normal recovery it seems stuck
> > > > with one active+undersized+degraded+inconsistent pg.
> > > >
> > > Any reason (other than inertia, which I understand very well) you're
> > > running a non LTS version that last saw bug fixes a year ago?
> > > You may very well be facing a bug that has long been fixed even in
> > > Firefly, let alone Hammer.
> > >
> > I know we discussed Hammer several times, and I don't remember the exact
> > reason we held off.  Other than that, Inertia is probably the best
> > answer I have.
> >
> Fair enough.
>
> I just seem to remember similar scenarios where recovery got stuck/hung
> and thus would assume it was fixed in newer versions.
>
> If you google for "ceph recovery stuck" you find another potential
> solution behind the RH paywall and this:
>
> http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-October/043894.html
>
> That would have been my next suggestion anyway, Ceph OSDs seem to take
> well to the 'IT crowd' mantra of "Have you tried turning it off and on
> again?". ^o^
>
Yeah, unfortunately that was something I tried before reaching out on the
mailing list.  It didn't seem to change anything.

In particular, I was noticing that my "ceph pg repair 12.28a" command never
seemed to be acknowledged by the OSD.  I was hoping for some sort of log
message, even an 'ERR', but while I saw messages about other pg scrubs,
nothing shows up for the problem PG.  I tried before and after an OSD
restart (both OSDs) without any apparent change.

>
> > >
> > > If so, hopefully one of the devs remembering it can pipe up.
> > >
> > > > I haven't been able to get repair to happen using "ceph pg repair
> > > > 12.28a"; I can see the activity logged in the mon logs, however the
> > > > repair doesn't actually seem to happen in any of the actual osd logs.
> > > >
> > > > I tried folowing Sebiastien's instructions for manually locating the
> > > > inconsistent object (
> > > >
> http://www.sebastien-han.fr/blog/2015/04/27/ceph-manually-repair-object/
> > > ),
> > > > however the md5sum from the objects both match, so I'm not quite
> > > > sure how to proceed.
> > > >
> > > Rolling a dice? ^o^
> > > Do they have similar (identical really) timestamps as well?
> > >
> > Yes, timestamps are identical.
> >
> Unsurprisingly.
>
> > >
> > > > Any ideas on how to return to a healthy cluster?
> > > >
> > > > [root@soi-ceph2 ceph]# ceph status
> > > > cluster 6cc00165-4956-4947-8605-53ba51acd42b
> > > >  health HEALTH_ERR 1023 pgs degraded; 1 pgs inconsistent; 1023
> > > > pgs stuck degraded; 1099 pgs stuck unclean; 1023 pgs stuck
> > > > undersized; 1023 pgs undersized; recovery 132091/23742762 objects
> > > > degraded (0.556%); 7745/23742762 objects misplaced (0.033%); 1 scrub
> > > > errors monmap e5: 3 mons at {soi-ceph1=
> > > >
> 10.2.2.11:6789/0,soi-ceph2=10.2.2.12:6789/0,soi-ceph3=10.2.2.13:6789/0},
> > > > election epoch 4132, quorum 0,1,2 soi-ceph1,soi-ceph2,soi-ceph3
> > > >  osdmap e41120: 60 osds: 59 up, 59 in
> > > >   pgmap v37432002: 61440 pgs, 15 pools, 30513 GB data, 7728
> > > > kobjects 91295 GB used, 73500 GB / 160 TB avail
> > > > 132091/23742762 objects degraded (0.556%); 7745/23742762
> > > > objects misplaced (0.033%)
> > > >60341 active+clean
> > > >   76 active+remapped
> > > > 1022 active+undersized+degraded
> > > >1 active+undersized+degraded+inconsistent
> > > >   client io 44548 B/s rd, 19591 kB/s wr, 1095 op/s
> > > >
> > > What's confusing to me in this picture are the stuck and unclean PGs as
> > > well as degraded objects, it seems that recovery has stopped?
> > >
> > Yeah ... recovery essentially halted.  I'm sure its no accident that
> > there are exactly 1023 (1024-1) unhealthy pgs.
> >
> > >
> > > Something else that suggests a bug, or at least a stuck OSD.
> > >
> > > > [root@soi-ceph2 ceph]# ceph health detail | grep inconsistent
> > > > pg 12.28a is stuck unclean for 126274.215835, current state
> > > > active+undersized+degraded+inconsistent, last acting [36,52]
> > > > pg 12.28a is stuck undersized for 3499.099747, current state
> > > > active+undersized+degraded+inconsistent, last acting [36,52]
> > > > pg 12.28a is stuck degraded for 3499.107051, current state
> > > > active+undersized+degraded+inconsistent, last acting [36,52]
> > > > pg 12.28a is active+undersized+degraded+inconsistent, acting [36,52]
> > > >
> > > > [root@soi-ceph2 ceph]# zgrep 'ERR' *.gz
> > > > ceph-osd.36.log-20160325.gz:2016-03-24 12:00:43.568221 7fe7b2897700
> > > > -1 log_channel(default

Re: [ceph-users] chunk-based cache in ceph with erasure coded back-end storage

2016-03-31 Thread huang jun
the data encode/decode operation is done on OSD side.

2016-03-31 23:28 GMT+08:00 Yu Xiang :
> Thanks for the reply!
> So where did the decoding process happen? Is it in cache or on the client
> side? (Only considering Read.) If it happened when copying from storage tier
> to cache, then it has to be an whole object (file), but if decoding can be
> happened on client side when the client has all needed chunks, it seems
> cache can hold partial chunks of the file? What i mean is that is it
> possible for cache to hold partial chunks of a file in Ceph?  (assuming file
> A has 7 chunks in storage tier, to recover file A a client needs 4 chunks,
> will it be possible that 2 chunks of file A are copied to and stored in
> cache, when file A is requested, only another 2 chunks are needed from the
> storage tier? )
>
> Thanks!
>
>
>
> e-Original Message-
> From: huang jun 
> To: Yu Xiang 
> Cc: ceph-users 
> Sent: Wed, Mar 30, 2016 9:04 pm
> Subject: Re: [ceph-users] chunk-based cache in ceph with erasure coded
> back-end storage
>
>
> if your cache-mode is write-back, which will cache the read object in
> cache tier.
> you can try the read-proxy mode, which will not cache the object.
> the read request send to primary OSD, and the primary osd collect the
> shards from base tier(in you case, is erasure code pool),
> you need to read at least k chunks to decode the object.
> In current code, cache tier only store the whole object, not the shards.
>
>
> 2016-03-31 6:10 GMT+08:00 Yu Xiang :
>> Dear List,
>> I am exploring in ceph caching tier recently, considering a cache-tier
>> (replicated) and a back storage-tier (erasure-coded), so chunks are stored
>> in the OSDs in the erasure-coded storage tier, when a file has been
>> requested to read, usually, all chunks in the storage tier would be copied
>> to the cache tier, replicated, and stored in the OSDs in caching pool, but
>> i
>> was wondering would it be possible that if only partial chunks of the
>> requested file be copied to cache? or it has to be a complete file? for
>> example, a file using (7,4) erasure code (4 original chunks, 3 encoded
>> chunks), when read it might be 4 required chunks are copied to cache, and
>> i
>> was wondering if it's possible to copy only 2 out of 4 required chunks to
>> cache, and the users getting the other 2 chunks elsewhere (or assuming the
>> client already has 2 chunks, they only need another 2 from ceph)? can the
>> cache store partial chunks of a file?
>>
>> Thanks in advance for any help!
>>
>> Best,
>> Yu
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> thanks
> huangjun



-- 
thanks
huangjun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] chunk-based cache in ceph with erasure coded back-end storage

2016-03-31 Thread Yu Xiang
Thanks for the reply!
So where did the decoding process happen? Is it in cache or on the client side? 
(Only considering Read.) If it happened when copying from storage tier to 
cache, then it has to be an whole object (file), but if decoding can be 
happened on client side when the client has all needed chunks, it seems cache 
can hold partial chunks of the file? What i mean is that is it possible for 
cache to hold partial chunks of a file in Ceph?  (assuming file A has 7 chunks 
in storage tier, to recover file A a client needs 4 chunks, will it be possible 
that 2 chunks of file A are copied to and stored in cache, when file A is 
requested, only another 2 chunks are needed from the storage tier? )


Thanks! 




e-Original Message-

From: huang jun 
To: Yu Xiang 
Cc: ceph-users 
Sent: Wed, Mar 30, 2016 9:04 pm
Subject: Re: [ceph-users] chunk-based cache in ceph with erasure coded back-end 
storage

if your cache-mode is write-back, which will cache the read object in
cache tier.
you can try the read-proxy mode, which will not cache the object.
the read request send to primary OSD, and the primary osd collect the
shards from base tier(in you case, is erasure code pool),
you need to read at least k chunks  to decode the object.
In current code, cache tier only store the whole object, not the shards.


2016-03-31 6:10 GMT+08:00 Yu Xiang :
> Dear List,
> I am exploring in ceph caching tier recently, considering a cache-tier
> (replicated) and a back storage-tier (erasure-coded), so chunks are stored
> in the OSDs in the erasure-coded storage tier, when a file has been
> requested to read,  usually, all chunks in the storage tier would be copied
> to the cache tier, replicated, and stored in the OSDs in caching pool, but i
> was wondering would it be possible that if only partial chunks of the
> requested file be copied to cache? or it has to be a complete file? for
> example, a file using (7,4) erasure code (4 original chunks, 3 encoded
> chunks), when read it might be 4 required chunks are copied to cache, and i
> was wondering if it's possible to copy only 2 out of 4 required chunks to
> cache, and the users getting the other 2 chunks elsewhere (or assuming the
> client already has 2 chunks, they only need another 2 from ceph)? can the
> cache store partial chunks of a file?
>
> Thanks in advance for any help!
>
> Best,
> Yu
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
thanks
huangjun

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD crash after conversion to bluestore

2016-03-31 Thread Oliver Dzombic
Hi,

if i understand it correct, bluestore wont use / is not a filesystem to
be mounted.

So if an osd is up and in, while we dont see its mounted into the
filesystem and accessable, we could assume that it must be powered by
bluestore... !??!

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 31.03.2016 um 14:47 schrieb German Anders:
> having jewel install, is possible to run a command in order to see that
> the OSD is actually using bluestore?
> 
> Thanks in advance,
> 
> Best,
> 
> 
> **
> 
> *German*
> 
> 2016-03-31 1:24 GMT-03:00 Adrian Saul  >:
> 
> 
> I upgraded my lab cluster to 10.1.0 specifically to test out
> bluestore and see what latency difference it makes.
> 
> I was able to one by one zap and recreate my OSDs to bluestore and
> rebalance the cluster (the change to having new OSDs start with low
> weight threw me at first, but once  I worked that out it was fine).
> 
> I was all good until I completed the last OSD, and then one of the
> earlier ones fell over and refuses to restart.  Every attempt to
> start fails with this assertion failure:
> 
> -2> 2016-03-31 15:15:08.868588 7f931e5f0800  0 
> cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan
> -1> 2016-03-31 15:15:08.868800 7f931e5f0800  1 
> cls/timeindex/cls_timeindex.cc:259: Loaded timeindex class!
>  0> 2016-03-31 15:15:08.870948 7f931e5f0800 -1 osd/OSD.h: In
> function 'OSDMapRef OSDService::get_map(epoch_t)' thread
> 7f931e5f0800 time 2016-03-31 15:15:08.869638
> osd/OSD.h: 886: FAILED assert(ret)
> 
>  ceph version 10.1.0 (96ae8bd25f31862dbd5302f304ebf8bf1166aba6)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x85) [0x558cee37da55]
>  2: (OSDService::get_map(unsigned int)+0x3d) [0x558cedd6a6fd]
>  3: (OSD::init()+0xf22) [0x558cedd1d172]
>  4: (main()+0x2aab) [0x558cedc83a2b]
>  5: (__libc_start_main()+0xf5) [0x7f931b506b15]
>  6: (()+0x349689) [0x558cedccd689]
>  NOTE: a copy of the executable, or `objdump -rdS ` is
> needed to interpret this.
> 
> 
> I could just zap and recreate it again, but I would be curious to
> know how to fix it, or unless someone can suggest if this is a bug
> that needs looking at.
> 
> Cheers,
>  Adrian
> 
> 
> Confidentiality: This email and any attachments are confidential and
> may be subject to copyright, legal or some other professional
> privilege. They are intended solely for the attention and use of the
> named addressee(s). They may only be copied, distributed or
> disclosed with the consent of the copyright owner. If you have
> received this email by mistake or by breach of the confidentiality
> clause, please notify the sender immediately by return email and
> delete or destroy all copies of the email. Any confidentiality,
> privilege or copyright is not waived or lost because this email has
> been sent to you by mistake.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD crash after conversion to bluestore

2016-03-31 Thread German Anders
having jewel install, is possible to run a command in order to see that the
OSD is actually using bluestore?

Thanks in advance,

Best,


*German*

2016-03-31 1:24 GMT-03:00 Adrian Saul :

>
> I upgraded my lab cluster to 10.1.0 specifically to test out bluestore and
> see what latency difference it makes.
>
> I was able to one by one zap and recreate my OSDs to bluestore and
> rebalance the cluster (the change to having new OSDs start with low weight
> threw me at first, but once  I worked that out it was fine).
>
> I was all good until I completed the last OSD, and then one of the earlier
> ones fell over and refuses to restart.  Every attempt to start fails with
> this assertion failure:
>
> -2> 2016-03-31 15:15:08.868588 7f931e5f0800  0 
> cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan
> -1> 2016-03-31 15:15:08.868800 7f931e5f0800  1 
> cls/timeindex/cls_timeindex.cc:259: Loaded timeindex class!
>  0> 2016-03-31 15:15:08.870948 7f931e5f0800 -1 osd/OSD.h: In function
> 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f931e5f0800 time
> 2016-03-31 15:15:08.869638
> osd/OSD.h: 886: FAILED assert(ret)
>
>  ceph version 10.1.0 (96ae8bd25f31862dbd5302f304ebf8bf1166aba6)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x85) [0x558cee37da55]
>  2: (OSDService::get_map(unsigned int)+0x3d) [0x558cedd6a6fd]
>  3: (OSD::init()+0xf22) [0x558cedd1d172]
>  4: (main()+0x2aab) [0x558cedc83a2b]
>  5: (__libc_start_main()+0xf5) [0x7f931b506b15]
>  6: (()+0x349689) [0x558cedccd689]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed
> to interpret this.
>
>
> I could just zap and recreate it again, but I would be curious to know how
> to fix it, or unless someone can suggest if this is a bug that needs
> looking at.
>
> Cheers,
>  Adrian
>
>
> Confidentiality: This email and any attachments are confidential and may
> be subject to copyright, legal or some other professional privilege. They
> are intended solely for the attention and use of the named addressee(s).
> They may only be copied, distributed or disclosed with the consent of the
> copyright owner. If you have received this email by mistake or by breach of
> the confidentiality clause, please notify the sender immediately by return
> email and delete or destroy all copies of the email. Any confidentiality,
> privilege or copyright is not waived or lost because this email has been
> sent to you by mistake.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xenserver or xen ceph

2016-03-31 Thread Iban Cabrillo
Hi,
   We have had lot of troubles developing a ceph over xen. Finaly after
haunderd of test, we talked with libvirt-users seems that the problem was
found on libxl library which did not have support for rbd, but seems this
is solved from 1.3.2 libvirt version and xen-utils >4.5


http://libvirt.org/git/?p=libvirt.git;a=commit;h=fb2bd208e52bda2d329b95d9545baa8bf04558af

   We do not check it yet because we are waiting for the last ubuntu 16.04
LTS released next month.

Regards, I

2016-03-31 6:46 GMT+02:00 Jiri Kanicky :

> Hi.
>
> There is a solution for Ceph in XenServer. With the help of my engineer
> Mark, we developed a simple patch which allows you to search and attach RBD
> image on XenServer. We create LVHD over the RBD (not RBD per VDI mapping
> yet), so it is far from ideal, but its a good start. The process of
> creating the SR over RBD works even from XenCenter.
>
> https://github.com/mstarikov/rbdsr
>
> Install notes are included and its very simple. Takes you few minutes per
> XenServer.
>
> We have been running this in our Sydney Citrix  lab for sometime and I
> have been running this at home also. Works great. For the future, the patch
> should work in the upcoming version of XenServer (Dundee) as well. Also we
> are trying to push native Ceph packages in the new version and build
> experimental (not official or approved yet) version of smapi which would
> allow us to map RBD per VDI. But there are no details on this. Anyway,
> everyone is welcome to participate in improving the patch on github.
>
> Let me know if you have any questions.
>
> Cheers,
> Jiri
>
> On 16/02/2016 15:30, Christian Balzer wrote:
>
>> On Tue, 16 Feb 2016 11:52:17 +0800 (CST) maoqi1982 wrote:
>>
>> Hi lists
>>> Is there any solution or documents that ceph as xenserver or xen backend
>>> storage?
>>>
>>>
>>> Not really.
>>
>> There was a project to natively support Ceph (RBD) in Xenserver but that
>> seems to have gone nowhere.
>>
>> There was also a thread last year here "RBD hard crash on kernel
>> 3.10" (google for it) wher Shawn Edwards was working on something similar,
>> but that seems to have died off silently as well.
>>
>> While you could of course do a NFS (some pains) or iSCSI (major pains)
>> head for Ceph the pains and reduced performance make it not an attractive
>> proposition.
>>
>> Christian
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Iban Cabrillo Bartolome
Instituto de Fisica de Cantabria (IFCA)
Santander, Spain
Tel: +34942200969
PGP PUBLIC KEY:
http://pgp.mit.edu/pks/lookup?op=get&search=0xD9DF0B3D6C8C08AC

Bertrand Russell:
*"El problema con el mundo es que los estúpidos están seguros de todo y los
inteligentes están llenos de dudas*"
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com