Re: [ceph-users] PGs lost from cephfs data pool, how to determine which files to restore from backup?

2016-09-07 Thread Goncalo Borges

Hi Greg...




I've had to force recreate some PGs on my cephfs data pool due to some
cascading disk failures in my homelab cluster. Is there a way to easily
determine which files I need to restore from backup? My metadata pool is
completely intact.

Assuming you're on Jewel, run a recursive "scrub" on the MDS root via
the admin socket, and all the missing files should get logged in the
local MDS log.


The data file is stripped into different objects (according to the 
selected layout) that are then stored in different pgs and OSDs.


So, if a few pgs are lost, it mean that some files may be totally lost 
(if all of its objects were stored in the lost pgs) or that some files 
may only be partially lost (if some of its objects were stored in the 
losts pgs)


Does this method properly takes into account for the second mentioned case?




(I'm surprised at this point to discover we don't seem to have any
documentation about how scrubbing works. It's a regular admin socket
command and "ceph daemon mds. help" should get you going where
you need.)


Indeed. Only found some references to it on John's CephFS update Feb 
2016 talk: http://www.slideshare.net/JohnSpray1/cephfs-update-february-2016


Cheers
Goncalo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 2 osd failures

2016-09-07 Thread Christian Balzer

Hello,

On Wed, 7 Sep 2016 08:38:24 -0400 Shain Miley wrote:

> Well not entirely too late I guess :-(
>
Then re-read my initial reply and see if you can find something in other
logs (syslog/kernel) to explain this.
As well as if those OSDs are all on the same node, maybe have missed
their upgrade, etc.
 
> I woke up this morning to see that two OTHER osd's had been marked down 
> and out.
> 
> I again restarted the osd daemons and things seem to be ok at this point.
> 
Did you verify that they were still running at that time (ps)?

Also did you look at ceph.log on a MON node to see what their view of this
was?

> I agree that I need to get to the bottom on why this happened.
> 
> I have uploaded the log files from 1 of the downed osd's here:
> 
> http://filebin.ca/2uFoRw017TCD/ceph-osd.51.log.1
> http://filebin.ca/2uFosTO8oHmj/ceph-osd.51.log
> 
These are very sparse. much sparser than what I see with default
parameters when I do a restart.

The three heartbeat check lines don't look good at all and likely the
reason this is happening (other OSDs voting it down).

> You can see my osd restart at about 6:15 am this morningother than 
> that I don't see anything indicated in the log files (although I could 
> be missing it for sure).
>
See above, ceph.log, when that OSD was declared down. 
Which would be around/after 02:03:08 from the OSD log.

What's happening at that time from the perspective of the rest of the
cluster?
 
> Just an FYI we are currently running ceph version 0.94.9..which I 
> upgraded to at the end of last week (from 0.94.6 I think)
> 
Only on my test cluster I have 0.94.9, but not much action obviously.
But if this were a regression of sorts, one would think others might
encounter it, too.

Christian

> This cluster is about 2 or 3 years old at this point and we have not run 
> into this issue at all up to this point.
> 
> Thanks,
> 
> Shain
> 
> 
> On 09/07/2016 12:00 AM, Christian Balzer wrote:
> > Hello,
> >
> > Too late I see, but still...
> >
> > On Tue, 6 Sep 2016 22:17:05 -0400 Shain Miley wrote:
> >
> >> Hello,
> >>
> >> It looks like we had 2 osd's fail at some point earlier today, here is
> >> the current status of the cluster:
> >>
> > You will really want to find out how and why that happened, because while
> > not impossible this is pretty improbable.
> >
> > Something like HW, are the OSDs on the same host, or maybe an OOM event,
> > etc.
> >   
> >> root@rbd1:~# ceph -s
> >>   cluster 504b5794-34bd-44e7-a8c3-0494cf800c23
> >>health HEALTH_WARN
> >>   2 pgs backfill
> >>   5 pgs backfill_toofull
> > Bad, you will want your OSDs back in and then some.
> > Have a look at "ceph osd df".
> >
> >>   69 pgs backfilling
> >>   74 pgs degraded
> >>   1 pgs down
> >>   1 pgs peering
> > Not good either.
> > W/o bringing back your OSDs that means doom for the data on those PGs.
> >
> >>   74 pgs stuck degraded
> >>   1 pgs stuck inactive
> >>   75 pgs stuck unclean
> >>   74 pgs stuck undersized
> >>   74 pgs undersized
> >>   recovery 1903019/105270534 objects degraded (1.808%)
> >>   recovery 1120305/105270534 objects misplaced (1.064%)
> >>   crush map has legacy tunables
> >>monmap e1: 3 mons at
> >> {hqceph1=10.35.1.201:6789/0,hqceph2=10.35.1.203:6789/0,hqceph3=10.35.1.205:6789/0}
> >>   election epoch 282, quorum 0,1,2 hqceph1,hqceph2,hqceph3
> >>osdmap e25019: 108 osds: 105 up, 105 in; 74 remapped pgs
> >> pgmap v30721368: 3976 pgs, 17 pools, 144 TB data, 51401 kobjects
> >>   285 TB used, 97367 GB / 380 TB avail
> >>   1903019/105270534 objects degraded (1.808%)
> >>   1120305/105270534 objects misplaced (1.064%)
> >>   3893 active+clean
> >> 69 active+undersized+degraded+remapped+backfilling
> >>  6 active+clean+scrubbing
> >>  3 active+undersized+degraded+remapped+backfill_toofull
> >>  2 active+clean+scrubbing+deep
> > When in recovery/backfill situations, you always want to stop any and all
> > scrubbing.
> >
> >>  2
> >> active+undersized+degraded+remapped+wait_backfill+backfill_toofull
> >>  1 down+peering
> >> recovery io 248 MB/s, 84 objects/s
> >>
> >> We had been running for a while with 107 osd's (not 108), it looks like
> >> osd's 64 and 76 are both now down and out at this point.
> >>
> >>
> >> I have looked though the ceph logs for each osd and did not see anything
> >> obvious, the raid controller also does not show the disk offline.
> >>
> > Get to the bottom of that, normally something gets logged when an OSD
> > fails.
> >
> >> I am wondering if I should try to restart the two osd's that are showing
> >> as down...or should I wait until the current recovery is complete?
> >

Re: [ceph-users] PGs lost from cephfs data pool, how to determine which files to restore from backup?

2016-09-07 Thread Gregory Farnum
On Wed, Sep 7, 2016 at 7:44 AM, Michael Sudnick
 wrote:
> I've had to force recreate some PGs on my cephfs data pool due to some
> cascading disk failures in my homelab cluster. Is there a way to easily
> determine which files I need to restore from backup? My metadata pool is
> completely intact.

Assuming you're on Jewel, run a recursive "scrub" on the MDS root via
the admin socket, and all the missing files should get logged in the
local MDS log.

(I'm surprised at this point to discover we don't seem to have any
documentation about how scrubbing works. It's a regular admin socket
command and "ceph daemon mds. help" should get you going where
you need.)
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Developer Monthly

2016-09-07 Thread Patrick McGarry
Hey cephers,

Tonight the CDM meeting is an APAC-friendly time slot (9p EDT), so
please drop a quick 1-liner and pad link if you have something to
discuss. Thanks!

http://tracker.ceph.com/projects/ceph/wiki/CDM_07-SEP-2016



-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OpenStack Barcelona discount code

2016-09-07 Thread Patrick McGarry
And since my url pasting failed on the Eventbrite link, here is the
registration link:

https://openstacksummit2016barcelona.eventbrite.com/



On Wed, Sep 7, 2016 at 2:30 PM, Patrick McGarry  wrote:
> Hey cephers,
>
> For those who are attending OpenStack Summit in Barcelona this October
> and have not yer purchased your ticket I wanted to share a 20%
> discount code that has just been provided to Red Hat that we can
> freely share. The code you need to enter is:
>
> RED_OPENSTACKSUMMIT
>
> This 20% discount code is good for unlimited uses on Full Access
> registrations until online sales end on October 22. This code is not
> applicable for people who have already registered at the early bird
> price, cannot be used in person onsite in Barcelona, and cannot be
> used for day only passes. Prices increase again on September 28, but
> this code will still be valid for 20% off the increased price until
> October 22.
>
> The code can be redeemed on Eventbrite HERE. When you register on
> Eventbrite, you will need to first click on “Enter Promotional Code”
> and then copy and paste your code in the text box before you select
> the ticket type. It's easy to miss, so please reference the following
> illustration:
> https://www.dropbox.com/s/eyvjn99erg1p47p/HowToUseRegCode_BarcelonaSummit.jpg?dl=0
>
> If you have any problems or questions please let me know. Thanks!
>
> --
>
> Best Regards,
>
> Patrick McGarry
> Director Ceph Community || Red Hat
> http://ceph.com  ||  http://community.redhat.com
> @scuttlemonkey || @ceph



-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OpenStack Barcelona discount code

2016-09-07 Thread Patrick McGarry
Hey cephers,

For those who are attending OpenStack Summit in Barcelona this October
and have not yer purchased your ticket I wanted to share a 20%
discount code that has just been provided to Red Hat that we can
freely share. The code you need to enter is:

RED_OPENSTACKSUMMIT

This 20% discount code is good for unlimited uses on Full Access
registrations until online sales end on October 22. This code is not
applicable for people who have already registered at the early bird
price, cannot be used in person onsite in Barcelona, and cannot be
used for day only passes. Prices increase again on September 28, but
this code will still be valid for 20% off the increased price until
October 22.

The code can be redeemed on Eventbrite HERE. When you register on
Eventbrite, you will need to first click on “Enter Promotional Code”
and then copy and paste your code in the text box before you select
the ticket type. It's easy to miss, so please reference the following
illustration:
https://www.dropbox.com/s/eyvjn99erg1p47p/HowToUseRegCode_BarcelonaSummit.jpg?dl=0

If you have any problems or questions please let me know. Thanks!

-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RFQ for Flowjo

2016-09-07 Thread Mike Jacobacci
Hey Henry, What a sec… That R101 is for our Nagios node, not apart of the Ceph 
monitoring nodes.  So both the R133 and the one R101 should have redundant 
power supplies.  Make sense?

Cheers
Mike
> On Sep 7, 2016, at 10:55 AM, Henry Figueroa 
>  wrote:
> 
> Mike
> The monitoring node (R101.v6) does not have a redundant power supply, I 
> reconfigured on a R133.v6
> Take a look and let me know if you approve, I will delete the R101.v6 after:
> Web: http://www.siliconmechanics.com/quotes/316693?confirmation=608475029 
> 
> PDF: 
> http://www.siliconmechanics.com/quotes/316693.pdf?confirmation=608475029 
> 
> 
> 
> 
> On Wed, Sep 7, 2016 at 10:39 AM, Mike Jacobacci  > wrote:
> Hey Henry,
> 
> Ok, so we just need to make two changes on the RFQ’s and we can start today.
> 
> 1.  RFQ 316693, please change to a redundant power supply.
> 2. Basically, move the two storage nodes to their own RFQ and everything else 
> can be on the other RFQ.
> 
> We will be paying with check and will be ordering the storage nodes now, then 
> the other nodes we can pay for on October 1st… Which leads me to my last 
> question, when does billing happen in relationship with node building?  The 
> reason I ask is that I want to get the other Nodes ASAP and would like to see 
> if you could have them shipped the same day we pay for them (Oct. 1st)
> 
> Cheers,
> Mike
> 
>> On Aug 24, 2016, at 11:18 AM, Henry Figueroa 
>> > > wrote:
>> 
>> Mike
>> Here you go
>> 
>> 
>> RFQ:316153 - R513.v6 - Ceph Storage + Monitoring Nodes + Xen Server
>> R513.v6 Ceph 
>> R101.v6 Monitoring Nodes
>> R331.v6 Xen VM Server with H/W RAID
>> 
>>   Web: http://www.siliconmechanics.com/quotes/316153?confirmation=800709744 
>> 
>> PDF: 
>> http://www.siliconmechanics.com/quotes/316153.pdf?confirmation=800709744 
>> 
>> 
>> RFQ:316693 R101.v6 QUAD-64GB-2x 80GB H/S RAID
>> Web: http://www.siliconmechanics.com/quotes/316693?confirmation=608475029 
>> 
>> PDF: 
>> http://www.siliconmechanics.com/quotes/316693.pdf?confirmation=608475029 
>> 
>> 
>> 
>> Get me the credit info/FInancial asap so that I can review with our CFO
>> 
>> H
>> 
>> On Wed, Aug 24, 2016 at 11:04 AM, Mike Jacobacci > > wrote:
>> Hey Henry,
>> 
>> Yes I meant to say R331, that’s what you upgraded us too on the last order.  
>> We would like the LSI 9361 4i RAID Controller installed along with the the 
>> 10Gbe card.  Let me know if you have any other questions.
>> 
>> Cheers,
>> Mike
>> 
>>> On Aug 24, 2016, at 10:41 AM, Henry Figueroa 
>>> >> > wrote:
>>> 
>>> Mike
>>> Quick question on the XEN Server.
>>> We are using the R309.v6 Platform, only 1x PCI slot and that is being used 
>>> up by the X710 10GbE SFP+, so there is not room for the HBA Card
>>> I will transfer the config to the R331.v6 which has 2x PCI SLots
>>> Now the question is .. Do you want an HBA or do you want a hardware RAID 
>>> Controller?
>>> 
>>> Let me know asap
>>> H
>>> 
>>> On Wed, Aug 24, 2016 at 10:19 AM, Henry Figueroa 
>>> >> > wrote:
>>> MIke/Seth
>>> I hosed this up and did not get it done, that being said you were on my 
>>> mind as this am I wrote a reminder to ask you about the quote ... :)
>>> Give me 15 minutes,
>>> On the mean time Seth asked for a credit app etc 
>>> FYI Our terms are net/30  If you want better than N/45 my cfo will ask for 
>>> financials/etc etc
>>> 
>>> I am including the app, complete & sign if you have a credit reference 
>>> sheet send me that but do sign our agreement. I included our wire/ach info 
>>> also.
>>> 
>>> H
>>> 
>>> 
>>> On Wed, Aug 24, 2016 at 10:14 AM, Mike Jacobacci >> > wrote:
>>> 
>>> 
 Begin forwarded message:
 
 From: Mike Jacobacci mailto:mi...@flowjo.com>>
 Subject: Re: RFQ for Flowjo
 Date: August 17, 2016 at 10:13:41 AM PDT
 To: Henry Figueroa >>> >
 
 Hey Henry,
 
 I know, I’m horrible! =D  But I got more for ya!  
 
 
 1. Can I get an RFQ for 3 R101 servers to act as Ceph monitoring nodes.  
 You made us a quote here:
 http://www.siliconmechanics.com/quotes/312908?confirmation=1390825971 
 
 We just need 3, not 4.
 
 2. I would also like an RFQ for another XenServer, I can’t find the email 
 with our last order, but I believ

Re: [ceph-users] NFS gateway

2016-09-07 Thread John Spray
On Wed, Sep 7, 2016 at 3:30 PM, jan hugo prins  wrote:
> Hi,
>
> One of the use-cases I'm currently testing is the possibility to replace
> a NFS storage cluster using a Ceph cluster.
>
> The idea I have is to use a server as an intermediate gateway. On the
> client side it will expose a NFS share and on the Ceph side it will
> mount the CephFS using mount.ceph. The whole network that holds the Ceph
> environment is 10G connected and when I use the same server as S3
> gateway I can store files rather quickly. When I use the same server as
> a NFS gateway putting data on the Ceph cluster is really very slow.
>
> The reason we want to do this is that we want to create a dedicated Ceph
> storage network and have all clients that need some data access either
> use S3 or NFS to access the data. I want to do this this way because I
> don't want to give the clients in some specific networks full access to
> the Ceph filesystem.
>
> Has anyone tried this before? Is this the way to go, or are there better
> ways to fix this?

Exporting kernel client mounts with the kernel NFS server is tested as
part of the regular testing we do on CephFS, so you should find it
pretty stable.  This is definitely a legitimate way of putting a layer
of security between your application servers and your storage cluster.

NFS Ganesha is also an option, that is not as well tested (yet) but it
has the advantage that you can get nice up to date Ceph client code
without worrying about upgrading the kernel.  I'm not sure if there
are recent ganesha packages with the ceph FSAL enabled available
online, so you may need to compile your own.

When you say you tried using the server as an NFS gateway, was that
with kernel NFS + kernel CephFS?  What kind of activities did you find
were running slowly (big files, small files, etc...)?

John

>
> --
> Met vriendelijke groet / Best regards,
>
> Jan Hugo Prins
> Infra and Isilon storage consultant
>
> Better.be B.V.
> Auke Vleerstraat 140 E | 7547 AN Enschede | KvK 08097527
> T +31 (0) 53 48 00 694 | M +31 (0)6 26 358 951
> jpr...@betterbe.com | www.betterbe.com
>
> This e-mail is intended exclusively for the addressee(s), and may not
> be passed on to, or made available for use by any person other than
> the addressee(s). Better.be B.V. rules out any and every liability
> resulting from any electronic transmission.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] NFS gateway

2016-09-07 Thread David
I have clients accessing CephFS over nfs (kernel nfs). I was seeing slow
writes with sync exports. I haven't had a chance to investigate and in the
meantime I'm exporting with async (not recommended, but acceptable in my
environment).

I've been meaning to test out Ganesha for a while now

@Sean, have you used Ganesha with Ceph? How does performance compare with
kernel nfs?

On Wed, Sep 7, 2016 at 3:30 PM, jan hugo prins  wrote:

> Hi,
>
> One of the use-cases I'm currently testing is the possibility to replace
> a NFS storage cluster using a Ceph cluster.
>
> The idea I have is to use a server as an intermediate gateway. On the
> client side it will expose a NFS share and on the Ceph side it will
> mount the CephFS using mount.ceph. The whole network that holds the Ceph
> environment is 10G connected and when I use the same server as S3
> gateway I can store files rather quickly. When I use the same server as
> a NFS gateway putting data on the Ceph cluster is really very slow.
>
> The reason we want to do this is that we want to create a dedicated Ceph
> storage network and have all clients that need some data access either
> use S3 or NFS to access the data. I want to do this this way because I
> don't want to give the clients in some specific networks full access to
> the Ceph filesystem.
>
> Has anyone tried this before? Is this the way to go, or are there better
> ways to fix this?
>
> --
> Met vriendelijke groet / Best regards,
>
> Jan Hugo Prins
> Infra and Isilon storage consultant
>
> Better.be B.V.
> Auke Vleerstraat 140 E | 7547 AN Enschede | KvK 08097527
> T +31 (0) 53 48 00 694 | M +31 (0)6 26 358 951
> jpr...@betterbe.com | www.betterbe.com
>
> This e-mail is intended exclusively for the addressee(s), and may not
> be passed on to, or made available for use by any person other than
> the addressee(s). Better.be B.V. rules out any and every liability
> resulting from any electronic transmission.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is rados_write_op_* any more efficient than issuing the commands individually?

2016-09-07 Thread Josh Durgin

On 09/06/2016 10:16 PM, Dan Jakubiec wrote:

Hello, I need to issue the following commands on millions of objects:

rados_write_full(oid1, ...)
rados_setxattr(oid1, "attr1", ...)
rados_setxattr(oid1, "attr2", ...)


Would it make it any faster if I combined all 3 of these into a single
rados_write_op and issued them "together" as a single call?

My current application doesn't really care much about the atomicity, but
maximizing our write throughput is quite important.

Does rados_write_op save any roundtrips to the OSD or have any other
efficiency gains?


Yes, individual calls will send one message per call, adding more round
trips and overhead, whereas bundling changes in a rados_write_op will
only send one message. You can see this in the number of MOSDOp
messages shown with 'debug ms = 1' on the client.

Josh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Jewel 10.2.2 - Error when flushing journal

2016-09-07 Thread Mehmet

Hey again,

now i have stopped my osd.12 via

root@:~# systemctl stop ceph-osd@12

and when i am flush the journal...

root@:~# ceph-osd -i 12 --flush-journal
SG_IO: questionable sense data, results may be incorrect
SG_IO: questionable sense data, results may be incorrect
*** Caught signal (Segmentation fault) **
 in thread 7f421d49d700 thread_name:ceph-osd
 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x96bdde) [0x564545e65dde]
 2: (()+0x113d0) [0x7f422277e3d0]
 3: [0x56455055a3c0]
2016-09-07 17:42:58.128839 7f421d49d700 -1 *** Caught signal 
(Segmentation fault) **

 in thread 7f421d49d700 thread_name:ceph-osd

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x96bdde) [0x564545e65dde]
 2: (()+0x113d0) [0x7f422277e3d0]
 3: [0x56455055a3c0]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.


 0> 2016-09-07 17:42:58.128839 7f421d49d700 -1 *** Caught signal 
(Segmentation fault) **

 in thread 7f421d49d700 thread_name:ceph-osd

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x96bdde) [0x564545e65dde]
 2: (()+0x113d0) [0x7f422277e3d0]
 3: [0x56455055a3c0]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.


Segmentation fault

The logfile with further information
- http://slexy.org/view/s2T8AohMfU

I guess i will get same message when i flush the other journals.

- Mehmet

Am 2016-09-07 13:23, schrieb Mehmet:

Hello ceph people,

yesterday i stopped one of my OSDs via

root@:~# systemctl stop ceph-osd@10

and tried to flush the journal for this osd via

root@:~# ceph-osd -i 10 --flush-journal

but getting this output on the screen:

SG_IO: questionable sense data, results may be incorrect
SG_IO: questionable sense data, results may be incorrect
*** Caught signal (Segmentation fault) **
 in thread 7fd846333700 thread_name:ceph-osd
 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x96bdde) [0x55f33b862dde]
 2: (()+0x113d0) [0x7fd84b6143d0]
 3: [0x55f345bbff80]
2016-09-06 22:12:51.850739 7fd846333700 -1 *** Caught signal
(Segmentation fault) **
 in thread 7fd846333700 thread_name:ceph-osd

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x96bdde) [0x55f33b862dde]
 2: (()+0x113d0) [0x7fd84b6143d0]
 3: [0x55f345bbff80]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.

 0> 2016-09-06 22:12:51.850739 7fd846333700 -1 *** Caught signal
(Segmentation fault) **
 in thread 7fd846333700 thread_name:ceph-osd

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x96bdde) [0x55f33b862dde]
 2: (()+0x113d0) [0x7fd84b6143d0]
 3: [0x55f345bbff80]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.

Segmentation fault

This is the logfile from my osd.10 with further informations
- http://slexy.org/view/s21tfwQ1fZ

Today i stopped another OSD (osd.11)

root@:~# systemctl stop ceph-osd@11

I did not not get the above mentioned error - but this

root@:~# ceph-osd -i 11 --flush-journal
SG_IO: questionable sense data, results may be incorrect
SG_IO: questionable sense data, results may be incorrect
2016-09-07 13:19:39.729894 7f3601a298c0 -1 flushed journal
/var/lib/ceph/osd/ceph-11/journal for object store
/var/lib/ceph/osd/ceph-11

This is the logfile from my osd.11 with further informations
- http://slexy.org/view/s2AlEhV38m

This is not realy a case actualy cause i will setup the journal
partitions again with 20GB (from 5GB actual) an bring the OSD then
bring up again.
But i thought i should mail this error to the mailing list.

This is my Setup:

*Software/OS*
- Jewel
#> ceph tell osd.* version | grep version | uniq
"version": "ceph version 10.2.2 
(45107e21c568dd033c2f0a3107dec8f0b0e58374)"


#> ceph tell mon.* version
[...] ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

- Ubuntu 16.04 LTS on all OSD and MON Server
#> uname -a
31.08.2016: Linux reilif 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11
18:01:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

*Server*
3x OSD Server, each with

- 2x Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz ==> 12 Cores, no 
Hyper-Threading


- 64GB RAM
- 10x 4TB HGST 7K4000 SAS2 (6GB/s) Disks as OSDs

- 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling Device
for 10-12 Disks

- 1x Samsung SSD 840/850 Pro only for the OS

3x MON Server
- Two of them with 1x Intel(R) Xeon(R) CPU E3-1265L V2 @ 2.50GHz (4
Cores, 8 Threads) - The third one has 2x Intel(R) Xeon(R) CPU L5430 @
2.66GHz ==> 8 Cores, no Hyper-Threading

- 32 GB RAM
- 1x Raid 10 (4 Disks)

*Network*
- Actualy each Server and Client has on active connection @ 1x 1GB; In
Short this will be changed to 2x 10GB Fibre perhaps with LACP when
possible.

- We do not use Jumbo Frames yet..

- Public and Cluster-Network related Ceph traffic is actualy going
through this one active 1GB Interface on each Server.

hf
- Mehmet
___
ce

Re: [ceph-users] Changing Replication count

2016-09-07 Thread Vlad Blando
Thanks

/vlad

On Wed, Sep 7, 2016 at 9:47 AM, LOPEZ Jean-Charles 
wrote:

> Hi,
>
> the stray replicas will be automatically removed in the background.
>
> JC
>
> On Sep 6, 2016, at 17:58, Vlad Blando  wrote:
>
> Sorry bout that
>
> It's all set now, i thought that was replica count as it is also 4 and 5 :)
>
> I can see the changes now
>
> [root@controller-node ~]# ceph osd dump | grep 'replicated size'
> pool 4 'images' replicated size 2 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 1024 pgp_num 1024 last_change 19641 flags hashpspool
> stripe_width 0
> pool 5 'volumes' replicated size 3 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 512 pgp_num 512 last_change 19640 flags hashpspool
> stripe_width 0
> [root@controller-node ~]#
>
>
> To my other question, will it remove the excess replicas?
>
> ​/vlad
>
> On Wed, Sep 7, 2016 at 8:51 AM, Jeff Bailey  wrote:
>
>>
>>
>> On 9/6/2016 8:41 PM, Vlad Blando wrote:
>>
>>> Hi,
>>>
>>> My replication count now is this
>>>
>>> [root@controller-node ~]# ceph osd lspools
>>> 4 images,5 volumes,
>>>
>>
>> Those aren't replica counts they're pool ids.
>>
>> [root@controller-node ~]#
>>>
>>> and I made adjustment and made it to 3 for images and 2 to volumes to 3,
>>> it's been 30 mins now and the values did not change, how do I know if it
>>> was really changed.
>>>
>>> this is the command I executed
>>>
>>>  ceph osd pool set images size 2
>>>  ceph osd pool set volumes size 3
>>>
>>> ceph osd pool set images min_size 2
>>> ceph osd pool set images min_size 2
>>>
>>>
>>> Another question, since the previous replication count for images is 4
>>> and volumes to 5, it will delete the excess replication right?
>>>
>>> Thanks for the help
>>>
>>>
>>> /vlad
>>> ᐧ
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> ᐧ
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
ᐧ
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] configuring cluster handle in python rados exits with error NoneType is not callable

2016-09-07 Thread Martin Hoffmann
I want to access ceph cluster rbd images via python interface. In a
standalone simple python script this works without problems. However i want
to create a plugin for bareos backup where this does not work and cluster
configure exits with error:

cluster =
rados.Rados(rados_id="admin",clustername="ceph",conffile="/etc/ceph/ceph.conf",conf
= dict(keyring = "/etc/ceph/ceph.client.admin.keyring"))

File "rados.pyx", line 525, in rados.Rados.__init__
(/tmp/buildd/ceph-10.2.2/src/build/rados.c:5878)

File "rados.pyx", line 423, in rados.requires.wrapper.validate_func
(/tmp/buildd/ceph-10.2.2/src/build/rados.c:4097)

TypeError: 'NoneType' object is not callable


Currently this is simple:

import rados

import rbd

rados.Rados(conffile='')

(or with some more parameters - no matter what always same error)


This is on ubuntu 14.04 with ceph 10.2.2 and latest bareos.

Identical code in simple python script works but embedded in bareos plugin
it does not.

Any idea what might be causing such behaviour?

Thanks in advance.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] NFS gateway

2016-09-07 Thread Sean Redmond
Have you seen this :

https://github.com/nfs-ganesha/nfs-ganesha/wiki/Fsalsupport#CEPH

On Wed, Sep 7, 2016 at 3:30 PM, jan hugo prins  wrote:

> Hi,
>
> One of the use-cases I'm currently testing is the possibility to replace
> a NFS storage cluster using a Ceph cluster.
>
> The idea I have is to use a server as an intermediate gateway. On the
> client side it will expose a NFS share and on the Ceph side it will
> mount the CephFS using mount.ceph. The whole network that holds the Ceph
> environment is 10G connected and when I use the same server as S3
> gateway I can store files rather quickly. When I use the same server as
> a NFS gateway putting data on the Ceph cluster is really very slow.
>
> The reason we want to do this is that we want to create a dedicated Ceph
> storage network and have all clients that need some data access either
> use S3 or NFS to access the data. I want to do this this way because I
> don't want to give the clients in some specific networks full access to
> the Ceph filesystem.
>
> Has anyone tried this before? Is this the way to go, or are there better
> ways to fix this?
>
> --
> Met vriendelijke groet / Best regards,
>
> Jan Hugo Prins
> Infra and Isilon storage consultant
>
> Better.be B.V.
> Auke Vleerstraat 140 E | 7547 AN Enschede | KvK 08097527
> T +31 (0) 53 48 00 694 | M +31 (0)6 26 358 951
> jpr...@betterbe.com | www.betterbe.com
>
> This e-mail is intended exclusively for the addressee(s), and may not
> be passed on to, or made available for use by any person other than
> the addressee(s). Better.be B.V. rules out any and every liability
> resulting from any electronic transmission.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PGs lost from cephfs data pool, how to determine which files to restore from backup?

2016-09-07 Thread Michael Sudnick
I've had to force recreate some PGs on my cephfs data pool due to some
cascading disk failures in my homelab cluster. Is there a way to easily
determine which files I need to restore from backup? My metadata pool is
completely intact.

Thanks for any help and suggestions.

Sincerely,
  Michael
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] NFS gateway

2016-09-07 Thread jan hugo prins
Hi,

One of the use-cases I'm currently testing is the possibility to replace
a NFS storage cluster using a Ceph cluster.

The idea I have is to use a server as an intermediate gateway. On the
client side it will expose a NFS share and on the Ceph side it will
mount the CephFS using mount.ceph. The whole network that holds the Ceph
environment is 10G connected and when I use the same server as S3
gateway I can store files rather quickly. When I use the same server as
a NFS gateway putting data on the Ceph cluster is really very slow.

The reason we want to do this is that we want to create a dedicated Ceph
storage network and have all clients that need some data access either
use S3 or NFS to access the data. I want to do this this way because I
don't want to give the clients in some specific networks full access to
the Ceph filesystem.

Has anyone tried this before? Is this the way to go, or are there better
ways to fix this?

-- 
Met vriendelijke groet / Best regards,

Jan Hugo Prins
Infra and Isilon storage consultant

Better.be B.V.
Auke Vleerstraat 140 E | 7547 AN Enschede | KvK 08097527
T +31 (0) 53 48 00 694 | M +31 (0)6 26 358 951
jpr...@betterbe.com | www.betterbe.com

This e-mail is intended exclusively for the addressee(s), and may not
be passed on to, or made available for use by any person other than 
the addressee(s). Better.be B.V. rules out any and every liability 
resulting from any electronic transmission.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw error in its log rgw_bucket_sync_user_stats()

2016-09-07 Thread Arvydas Opulskis
Hi,

just in case if someone experience same problem: the only thing that
helped, was restart of gateway. Only when I restarted it, I was able to
create that bucket without "access denied" error on other operations.
Seems, RGW had some old data cached in it.

Arvydas

On Tue, Sep 6, 2016 at 6:10 PM, Arvydas Opulskis 
wrote:

> It is not over yet. Now if user recreates problematic bucket, it appears,
> but with same "Access denied" error. Looks, like there are still some
> corrupted data left about this bucket in Ceph. No problems if user creates
> new bucket with very similar name.
> No errors in rgw log on bucket creation were noticed.
>
> Any ideas? :)
>
>
> On Tue, Sep 6, 2016 at 4:05 PM, Arvydas Opulskis 
> wrote:
>
>> Hi,
>>
>> time to time we have same problem on our Jewel cluster (10.2.2, upgraded
>> from Infernalis). I checked few last occurrences and noticed, it happened
>> when user tried delete bucket from S3, while Ceph cluster was on heavy load
>> (deep-scrub or PG back-fill operations running). Seems like some kind of
>> timeout.
>> After that, bucket left undeleted and S3 user got "Access denied" error
>> on any operation on bucket. I had to remove it using radosgw-admin tool.
>>
>> Arvydas
>>
>>
>> On Thu, Aug 18, 2016 at 12:00 PM, zhu tong 
>> wrote:
>>
>>> Hi all,
>>>
>>> Version: 0.94.7
>>> radosgw has reported the following error:
>>>
>>> 2016-08-16 15:26:06.883957 7fc2f0bfe700  0 ERROR:
>>> rgw_bucket_sync_user_stats() for user=user1, bucket=2537e61b32ca78343213823
>>> 7f234e610d1ee186e(@{i=.rgw.buckets.index,e=.rgw.buckets.extr
>>> a}.rgw.buckets[default.4151.167]) returned -2
>>> 2016-08-16 15:26:06.883989 7fc2f0bfe700  0 WARNING: sync_bucket()
>>> returned r=-2
>>>
>>> ERROR like this happens to user1's all buckets during that time.
>>>
>>> What caused this error? And what would this error affects?
>>>
>>>
>>> Thanks.
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 2 osd failures

2016-09-07 Thread Shain Miley

Well not entirely too late I guess :-(

I woke up this morning to see that two OTHER osd's had been marked down 
and out.


I again restarted the osd daemons and things seem to be ok at this point.

I agree that I need to get to the bottom on why this happened.

I have uploaded the log files from 1 of the downed osd's here:

http://filebin.ca/2uFoRw017TCD/ceph-osd.51.log.1
http://filebin.ca/2uFosTO8oHmj/ceph-osd.51.log

You can see my osd restart at about 6:15 am this morningother than 
that I don't see anything indicated in the log files (although I could 
be missing it for sure).


Just an FYI we are currently running ceph version 0.94.9..which I 
upgraded to at the end of last week (from 0.94.6 I think)


This cluster is about 2 or 3 years old at this point and we have not run 
into this issue at all up to this point.


Thanks,

Shain


On 09/07/2016 12:00 AM, Christian Balzer wrote:

Hello,

Too late I see, but still...

On Tue, 6 Sep 2016 22:17:05 -0400 Shain Miley wrote:


Hello,

It looks like we had 2 osd's fail at some point earlier today, here is
the current status of the cluster:


You will really want to find out how and why that happened, because while
not impossible this is pretty improbable.

Something like HW, are the OSDs on the same host, or maybe an OOM event,
etc.
  

root@rbd1:~# ceph -s
  cluster 504b5794-34bd-44e7-a8c3-0494cf800c23
   health HEALTH_WARN
  2 pgs backfill
  5 pgs backfill_toofull

Bad, you will want your OSDs back in and then some.
Have a look at "ceph osd df".


  69 pgs backfilling
  74 pgs degraded
  1 pgs down
  1 pgs peering

Not good either.
W/o bringing back your OSDs that means doom for the data on those PGs.


  74 pgs stuck degraded
  1 pgs stuck inactive
  75 pgs stuck unclean
  74 pgs stuck undersized
  74 pgs undersized
  recovery 1903019/105270534 objects degraded (1.808%)
  recovery 1120305/105270534 objects misplaced (1.064%)
  crush map has legacy tunables
   monmap e1: 3 mons at
{hqceph1=10.35.1.201:6789/0,hqceph2=10.35.1.203:6789/0,hqceph3=10.35.1.205:6789/0}
  election epoch 282, quorum 0,1,2 hqceph1,hqceph2,hqceph3
   osdmap e25019: 108 osds: 105 up, 105 in; 74 remapped pgs
pgmap v30721368: 3976 pgs, 17 pools, 144 TB data, 51401 kobjects
  285 TB used, 97367 GB / 380 TB avail
  1903019/105270534 objects degraded (1.808%)
  1120305/105270534 objects misplaced (1.064%)
  3893 active+clean
69 active+undersized+degraded+remapped+backfilling
 6 active+clean+scrubbing
 3 active+undersized+degraded+remapped+backfill_toofull
 2 active+clean+scrubbing+deep

When in recovery/backfill situations, you always want to stop any and all
scrubbing.


 2
active+undersized+degraded+remapped+wait_backfill+backfill_toofull
 1 down+peering
recovery io 248 MB/s, 84 objects/s

We had been running for a while with 107 osd's (not 108), it looks like
osd's 64 and 76 are both now down and out at this point.


I have looked though the ceph logs for each osd and did not see anything
obvious, the raid controller also does not show the disk offline.


Get to the bottom of that, normally something gets logged when an OSD
fails.


I am wondering if I should try to restart the two osd's that are showing
as down...or should I wait until the current recovery is complete?


As said, try to restart immediately, just to keep the traffic down for
starters.


The pool has a replica level of  '2'...and with 2 failed disks I want to
do whatever I can to make sure there is not an issue with missing objects.


I sure hope that pool holds backups or something of that nature.

The only times when a replica of 2 isn't a cry for Murphy to smite you is
with RAID backed OSDs or VERY well monitored and vetted SSDs.
  

Thanks in advance,

Shain


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





--
NPR | Shain Miley | Manager of Infrastructure, Digital Media | smi...@npr.org | 
202.513.3649

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] experiences in upgrading Infernalis to Jewel

2016-09-07 Thread Alexandre DERUMIER
Hi,

I think it's more simple to 

1) change the repository
2) apt-get dist-upgrade

3) restart mon on each node
4) restart osd on each node

done


I have upgrade 4 cluster like this without any problem
I never have used ceph-deploy for upgrade.


- Mail original -
De: "felderm" 
À: "ceph-users" 
Envoyé: Mercredi 7 Septembre 2016 14:27:02
Objet: [ceph-users] experiences in upgrading Infernalis to Jewel

Hi All 

We are preparing upgrade from Ceph Infernalis 9.2.0 to Ceph Jewel 
10.2.2. Based on the Upgrade procedure documentation 
http://docs.ceph.com/docs/jewel/install/upgrading-ceph/ it sound easy. 
But it often fails when you think it's easy. Therefore I would like to 
know your opinion for the following questions: 

1) We think to upgrade one monitor after each other. If we break one 
monitor, the 2 others are still operational. 

in the Documentation the propose 
ceph-deploy install --release jewel mon1 mon2 mon3 

Wouldn't it be wise to upgrade one after the other? 
ceph-deploy install --release jewel mon1 
ceph-deploy install --release jewel mon2 
ceph-deploy install --release jewel mon3 

Same procedure for Monitor Nodes ?? 

2) On which node would you recommend running the ceph-deploy command? 
which one is the admin node? 

3) If we are using ceph-deploy for upgrading, do we need to change the 
apt repository? 
#cat /etc/apt/sources.list.d/ceph_com_debian_infernalis.list 
deb http://ceph.com/debian-infernalis trusty main 

4) There is not possibility to revert the upgrade. Is there any plan B 
when the upgrade fails ? Sorry for being so pessimistic. 

5) General Experiences in upgrading ceph? Does it fails often? How was 
your plan B in case of upgrade failures ? 

Your feedbacks are highly appreciated! 
Thanks 
felder 



___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] experiences in upgrading Infernalis to Jewel

2016-09-07 Thread felderm
Hi All

We are preparing upgrade from Ceph Infernalis 9.2.0 to Ceph Jewel
10.2.2. Based on the Upgrade procedure documentation
http://docs.ceph.com/docs/jewel/install/upgrading-ceph/  it sound easy.
But it often fails when you think it's easy. Therefore I would like to
know your opinion for the following questions:

1) We think to upgrade one monitor after each other. If we break one
monitor, the 2 others are still operational.

in the Documentation the propose
ceph-deploy install --release jewel mon1 mon2 mon3

Wouldn't it be wise to upgrade one after the other?
ceph-deploy install --release jewel mon1
ceph-deploy install --release jewel mon2
ceph-deploy install --release jewel mon3

Same procedure for Monitor Nodes ??

2) On which node would you recommend running the ceph-deploy command?
which one is the admin node?

3) If we are using ceph-deploy for upgrading, do we need to change the
apt  repository?
#cat /etc/apt/sources.list.d/ceph_com_debian_infernalis.list
deb http://ceph.com/debian-infernalis trusty main

4) There is not possibility to revert the upgrade. Is there any plan B
when the upgrade fails ? Sorry for being so pessimistic.

5) General Experiences in upgrading ceph? Does it fails often? How was
your plan B in case of upgrade failures ?

Your feedbacks are highly appreciated!
Thanks
felder



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Jewel 10.2.2 - Error when flushing journal

2016-09-07 Thread Mehmet

Hello ceph people,

yesterday i stopped one of my OSDs via

root@:~# systemctl stop ceph-osd@10

and tried to flush the journal for this osd via

root@:~# ceph-osd -i 10 --flush-journal

but getting this output on the screen:

SG_IO: questionable sense data, results may be incorrect
SG_IO: questionable sense data, results may be incorrect
*** Caught signal (Segmentation fault) **
 in thread 7fd846333700 thread_name:ceph-osd
 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x96bdde) [0x55f33b862dde]
 2: (()+0x113d0) [0x7fd84b6143d0]
 3: [0x55f345bbff80]
2016-09-06 22:12:51.850739 7fd846333700 -1 *** Caught signal 
(Segmentation fault) **

 in thread 7fd846333700 thread_name:ceph-osd

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x96bdde) [0x55f33b862dde]
 2: (()+0x113d0) [0x7fd84b6143d0]
 3: [0x55f345bbff80]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.


 0> 2016-09-06 22:12:51.850739 7fd846333700 -1 *** Caught signal 
(Segmentation fault) **

 in thread 7fd846333700 thread_name:ceph-osd

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (()+0x96bdde) [0x55f33b862dde]
 2: (()+0x113d0) [0x7fd84b6143d0]
 3: [0x55f345bbff80]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.


Segmentation fault

This is the logfile from my osd.10 with further informations
- http://slexy.org/view/s21tfwQ1fZ

Today i stopped another OSD (osd.11)

root@:~# systemctl stop ceph-osd@11

I did not not get the above mentioned error - but this

root@:~# ceph-osd -i 11 --flush-journal
SG_IO: questionable sense data, results may be incorrect
SG_IO: questionable sense data, results may be incorrect
2016-09-07 13:19:39.729894 7f3601a298c0 -1 flushed journal 
/var/lib/ceph/osd/ceph-11/journal for object store 
/var/lib/ceph/osd/ceph-11


This is the logfile from my osd.11 with further informations
- http://slexy.org/view/s2AlEhV38m

This is not realy a case actualy cause i will setup the journal 
partitions again with 20GB (from 5GB actual) an bring the OSD then bring 
up again.

But i thought i should mail this error to the mailing list.

This is my Setup:

*Software/OS*
- Jewel
#> ceph tell osd.* version | grep version | uniq
"version": "ceph version 10.2.2 
(45107e21c568dd033c2f0a3107dec8f0b0e58374)"


#> ceph tell mon.* version
[...] ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

- Ubuntu 16.04 LTS on all OSD and MON Server
#> uname -a
31.08.2016: Linux reilif 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 
18:01:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


*Server*
3x OSD Server, each with

- 2x Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz ==> 12 Cores, no 
Hyper-Threading


- 64GB RAM
- 10x 4TB HGST 7K4000 SAS2 (6GB/s) Disks as OSDs

- 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling Device for 
10-12 Disks


- 1x Samsung SSD 840/850 Pro only for the OS

3x MON Server
- Two of them with 1x Intel(R) Xeon(R) CPU E3-1265L V2 @ 2.50GHz (4 
Cores, 8 Threads) - The third one has 2x Intel(R) Xeon(R) CPU L5430 @ 
2.66GHz ==> 8 Cores, no Hyper-Threading


- 32 GB RAM
- 1x Raid 10 (4 Disks)

*Network*
- Actualy each Server and Client has on active connection @ 1x 1GB; In 
Short this will be changed to 2x 10GB Fibre perhaps with LACP when 
possible.


- We do not use Jumbo Frames yet..

- Public and Cluster-Network related Ceph traffic is actualy going 
through this one active 1GB Interface on each Server.


hf
- Mehmet
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Raw data size used seems incorrect (version Jewel, 10.2.2)

2016-09-07 Thread David
Could be related to this? http://tracker.ceph.com/issues/13844

On Wed, Sep 7, 2016 at 7:40 AM, james  wrote:

> Hi,
>
> Not sure if anyone can help clarify or provide any suggestion on how to
> troubleshoot this
>
> We have a ceph cluster recently build up with ceph version Jewel, 10.2.2.
> Based on "ceph -s" it shows that the data size is around 3TB but rawdata
> used is only around 6TB,
> as the ceph is set with 3 replicates, I suppose the raw data should be
> around 9TB, is this correct and work as design?
> Thank you
>
> ceph@ceph1:~$ ceph -s
> cluster 292a8b61-549e-4529-866e-01776520b6bf
>  health HEALTH_OK
>  monmap e1: 3 mons at {cpm1=192.168.1.7:6789/0,cpm2=
> 192.168.1.8:6789/0,cpm3=192.168.1.9:6789/0}
> election epoch 70, quorum 0,1,2 cpm1,cpm2,cpm3
>  osdmap e1980: 18 osds: 18 up, 18 in
> flags sortbitwise
>   pgmap v1221102: 512 pgs, 1 pools, 3055 GB data, 801 kobjects
> 6645 GB used, 60380 GB / 67026 GB avail
>  512 active+clean
>
> ceph@ceph1:~$ ceph osd dump
> epoch 1980
> fsid 292a8b61-549e-4529-866e-01776520b6bf
> created 2016-08-12 09:30:28.771332
> modified 2016-09-06 06:34:43.068060
> flags sortbitwise
> pool 1 'default' replicated size 3 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 512 pgp_num 512 last_change 45 flags hashpspool
> stripe_width 0
> removed_snaps [1~3]
> 
> ceph@ceph1:~$ ceph df
> GLOBAL:
> SIZE   AVAIL  RAW USED %RAW USED
> 67026G 60380G6645G  9.91
> POOLS:
> NAMEID USED  %USED MAX AVAIL OBJECTS
> default 1  3055G 13.6826124G  821054
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Replacing a defective OSD

2016-09-07 Thread Ronny Aasen

On 07. sep. 2016 02:51, Vlad Blando wrote:

Hi,

I replaced a failed OSD and was trying to add it back to the pool, my
problem is that I am not detecting the physical disk. It looks like I
need to initialize it via the hardware raid before I can see it on the OS.

If I'm going to restart the said server so I can work on the RAID config
(Raid 0), what will be the behavior of the remaining 2 nodes? Will there
be a slowdown? Will there be backfilling? I want to minimize client impact.

Thanks.

​/vlad
ᐧ


if you do not want to do backfilling and recovery, but rather run 
slightly degraded while the node is down you could set the noout flag. 
as long as the cluster can operate with the osd's missing ok.

It's kind of ceph's "maintainance mode"

 # ceph osd set noout

remember to unset it when you are done, and all osd's are up/in again. 
you will not get HEALTH_OK while noout is set.



but a separate question is...
what kind of hardware controller do you have ? since most controllers 
allow you do edit config/add drives from within the OS  using the 
controller's often propritary softeware. that you often have to download 
from the vendor's webpages.


Do you find your controller on this list ?
https://wiki.debian.org/LinuxRaidForAdmins

this controller software is often needed for troubleshooting, and can 
give status and be monitored as well.




kind regards
Ronny Aasen





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com