Re: [ceph-users] Ceph Day Raleigh Cancelled

2015-08-26 Thread Patrick McGarry
Hey Daleep,

Right now we're planned through the end of the year, but if you know
of a location where we would get good attendance and maybe someone
with a space to host it we could certainly add that to the planning
for 2016. Feel free to shoot me an email if you know of any places
that might fit the bill.

Typically we shoot for about 75-100 people and like to cater in lunch,
snacks, and a cocktail reception. So that should give you an idea for
what we look at when planning a venue. Thanks.


On Wed, Aug 26, 2015 at 12:55 AM, Daleep Bais daleepb...@gmail.com wrote:
 Hi Patrick,

 is there any plans for such events to be held in India?

 Eagerly looking forward to it..

 Thanks.

 Daleep Singh Bais

 On Wed, Aug 26, 2015 at 2:53 AM, Patrick McGarry pmcga...@redhat.com
 wrote:

 Due to low registration this event is being pushed back to next year.
 The Ceph Day events for Shanghai, Tokyo, and Melbourne should all
 still be proceeding as planned, however. Feel free to contact me if
 you have any questions about Ceph Days. Thanks.


 --

 Best Regards,

 Patrick McGarry
 Director Ceph Community || Red Hat
 http://ceph.com  ||  http://community.redhat.com
 @scuttlemonkey || @ceph
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph monitoring with graphite

2015-08-26 Thread John Spray
On Wed, Aug 26, 2015 at 3:33 PM, Dan van der Ster d...@vanderster.com wrote:
 Hi Wido,

 On Wed, Aug 26, 2015 at 10:36 AM, Wido den Hollander w...@42on.com wrote:
 I'm sending pool statistics to Graphite

 We're doing the same -- stripping invalid chars as needed -- and I
 would guess that lots of people have written similar json2graphite
 convertor scripts for Ceph monitoring in the recent months.

 It makes me wonder if it might be useful if Ceph had a --format mode
 to output df/stats/perf commands directly in graphite compatible text.
 Shouldn't be too difficult to write.

Why would you want that instead of using e.g. diamond?  I think it
makes sense to have an external utility that converts a single ceph
format into whatever external tool's format.  The existing diamond
plugin is pretty comprehensive.

John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Day Raleigh Cancelled

2015-08-26 Thread Patrick McGarry
Yeah, we're still working on nailing down a venue for the Melbourne
event (but it looks like 05 Nov is probably the date). As soon as we
have a venue confirmed we'll put out a call for speakers and post the
details on the /cephdays/ page. Thanks!


On Tue, Aug 25, 2015 at 7:47 PM, Goncalo Borges
gonc...@physics.usyd.edu.au wrote:
 Hey Patrick...

 I am interested in the Melbourne one. Under
 http://ceph.com/cephdays/
 I do not see any reference to it.

 Can you give me more details on that?

 TIA
 Goncalo

 On 08/26/2015 07:23 AM, Patrick McGarry wrote:

 Due to low registration this event is being pushed back to next year.
 The Ceph Day events for Shanghai, Tokyo, and Melbourne should all
 still be proceeding as planned, however. Feel free to contact me if
 you have any questions about Ceph Days. Thanks.



 --
 Goncalo Borges
 Research Computing
 ARC Centre of Excellence for Particle Physics at the Terascale
 School of Physics A28 | University of Sydney, NSW  2006
 T: +61 2 93511937




-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Samsung pm863 / sm863 SSD info request

2015-08-26 Thread Emmanuel Florac
Le Tue, 25 Aug 2015 17:07:18 +0200
Jan Schermer j...@schermer.cz écrivait:

 There's a nice whitepaper about under-provisioning
 everyone using SSDs should read it
 
 http://www.sandisk.com/assets/docs/WP004_OverProvisioning_WhyHow_FINAL.pdf
 http://www.sandisk.com/assets/docs/WP004_OverProvisioning_WhyHow_FINAL.pdf

BTW you can perfectly under-provision SSD by hand, by not allocating
all space when partitionning them. It works just as well as
firmware-set under-provisioning (just buy the cheaper model, and
don't use up all available space, and you get the higher-end model for
cheap :)

-- 

Emmanuel Florac |   Direction technique
|   Intellique
|   eflo...@intellique.com
|   +33 1 78 94 84 02

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Tech Talk Tomorrow

2015-08-26 Thread Patrick McGarry
Hey cephers,

Don't forget that tomorrow is our monthly Ceph Tech Talk. This month
we're taking a look at performance measuring and tuning in Ceph. Mark
Nelson, Ceph's lead performance engineer will be giving an overview of
what's new in the performance world of Ceph and sharing some recent
findings, definitely not one to miss!

http://ceph.com/ceph-tech-talks/

1p Eastern on our BlueJeans video conferencing system. Hopefully we'll
see you there!


-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Samsung pm863 / sm863 SSD info request

2015-08-26 Thread Jan Schermer
Maybe if you TRIM it first, but the correct way to do that is like this:

https://www.thomas-krenn.com/en/wiki/SSD_Over-provisioning_using_hdparm

Jan


 On 26 Aug 2015, at 18:58, Emmanuel Florac eflo...@intellique.com wrote:
 
 Le Tue, 25 Aug 2015 17:07:18 +0200
 Jan Schermer j...@schermer.cz écrivait:
 
 There's a nice whitepaper about under-provisioning
 everyone using SSDs should read it
 
 http://www.sandisk.com/assets/docs/WP004_OverProvisioning_WhyHow_FINAL.pdf
 http://www.sandisk.com/assets/docs/WP004_OverProvisioning_WhyHow_FINAL.pdf
 
 BTW you can perfectly under-provision SSD by hand, by not allocating
 all space when partitionning them. It works just as well as
 firmware-set under-provisioning (just buy the cheaper model, and
 don't use up all available space, and you get the higher-end model for
 cheap :)
 
 -- 
 
 Emmanuel Florac |   Direction technique
|   Intellique
|  eflo...@intellique.com
|   +33 1 78 94 84 02
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados: Undefined symbol error

2015-08-26 Thread Aakanksha Pudipeddi-SSI
Hello Jason,

I checked the version of my built packages and they are all 9.0.2. I purged the 
cluster and uninstalled the packages and there seems to be nothing else - no 
older version. Could you elaborate on the fix for this issue?

Thanks,
Aakanksha

-Original Message-
From: Jason Dillaman [mailto:dilla...@redhat.com] 
Sent: Friday, August 21, 2015 6:37 AM
To: Aakanksha Pudipeddi-SSI
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] Rados: Undefined symbol error

It sounds like you have rados CLI tool from an earlier Ceph release ( Hammer) 
installed and it is attempting to use the librados shared library from a newer 
(= Hammer) version of Ceph.

Jason 


- Original Message - 

 From: Aakanksha Pudipeddi-SSI aakanksha...@ssi.samsung.com
 To: ceph-us...@ceph.com
 Sent: Thursday, August 20, 2015 11:47:26 PM
 Subject: [ceph-users] Rados: Undefined symbol error

 Hello,

 I cloned the master branch of Ceph and after setting up the cluster, 
 when I tried to use the rados commands, I got this error:

 rados: symbol lookup error: rados: undefined symbol:
 _ZN5MutexC1ERKSsbbbP11CephContext

 I saw a similar post here: http://tracker.ceph.com/issues/12563 but I 
 am not clear on the solution for this problem. I am not performing an 
 upgrade here but the error seems to be similar. Could anybody shed 
 more light on the issue and how to solve it? Thanks a lot!

 Aakanksha

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph repository for Debian Jessie

2015-08-26 Thread Konstantinos
Hi,

I would like to know if there will be a new repository hosting Jessie
packages in the near future.

If I am not mistaken the issue for not using the existing packages is that
there are a few (dependency) libraries in Jessie in newer versions and some
porting may be needed.

---

At the moment we are using Ceph/RADOS in quite a few deployments, one of
which is a production environment hosting VMs (using a
custom/in_house_developed block device layer - based on librados).
We are using Firefly and we would like to go to Hammer.

-- 
Kind Regards,
Konstantinos
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Can't mount Cephfs

2015-08-26 Thread Andrzej Łukawski

Hi,

We have ceph cluster (Ceph version 0.94.2) which consists of four nodes 
with four disks on each node. Ceph is configured to hold two replicas 
(size 2). We use this cluster for ceph filesystem. Few days ago we had 
power outage after which I had to replace three of our cluster OSD 
disks. All OSD disks are now online, but I'm unable to mount filesystem 
and constantly receive 'mount error 5 = Input/output error'.  Ceph 
status shows many 'incomplete' pgs and that 'mds cluster is degraded'. 
According to 'ceph health detail' mds is replaying journal.


[root@cnode0 ceph]# ceph -s
cluster 39c717a3-5e15-4e5e-bc54-7e7f1fd0ee24
 health HEALTH_WARN
25 pgs backfill_toofull
10 pgs degraded
126 pgs down
263 pgs incomplete
54 pgs stale
10 pgs stuck degraded
263 pgs stuck inactive
54 pgs stuck stale
289 pgs stuck unclean
10 pgs stuck undersized
10 pgs undersized
4 requests are blocked  32 sec
recovery 27139/10407227 objects degraded (0.261%)
recovery 168597/10407227 objects misplaced (1.620%)
4 near full osd(s)
too many PGs per OSD (312  max 300)
*mds cluster is degraded*
 monmap e6: 6 mons at 
{0=x.x.70.1:6789/0,0m=x.x.71.1:6789/0,1=x.x.70.2:6789/0,1m=x.x.71.2:6789/0,2=x.x.70.3:6789/0,2m=x.x.71.3:6789/0}

election epoch 2958, quorum 0,1,2,3,4,5 0,1,2,0m,1m,2m
 mdsmap e1236: 1/1/1 up {0=2=up:*replay*}, 2 up:standby
 osdmap e83705: 16 osds: 16 up, 16 in; 26 remapped pgs
  pgmap v40869228: 2496 pgs, 3 pools, 16952 GB data, 5046 kobjects
32825 GB used, 11698 GB / 44524 GB avail
27139/10407227 objects degraded (0.261%)
168597/10407227 objects misplaced (1.620%)
2153 active+clean
 137 incomplete
 126 down+incomplete
  54 stale+active+clean
  15 active+remapped+backfill_toofull
  10 active+undersized+degraded+remapped+backfill_toofull
   1 active+remapped
[root@cnode0 ceph]#

I wasn't able to find any solution in the Internet and I worry I will 
make things even worse when continue to troubleshoot this on my own. I'm 
stuck. Could you please help?


Thanks.
Andrzej







___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Why are RGW pools all prefixed with a period (.)?

2015-08-26 Thread Wido den Hollander
Hi,

It's something which has been 'bugging' me for some time now. Why are
RGW pools prefixed with a period?

I tried setting the root pool to 'rgw.root', but RGW (0.94.1) refuses to
start:

ERROR: region root pool name must start with a period

I'm sending pool statistics to Graphite and when sending a key like this
you 'break' Graphite: ceph.pools.stats.pool_name.kb_read

A pool like .rgw.root will break this since Graphite splits on periods.

So is there any reason why this is? What's the reasoning behind it?

-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-26 Thread Voloshanenko Igor
Great!
Yes, behaviour exact as i described. So looks like it's root cause )

Thank you, Sam. Ilya!

2015-08-21 21:08 GMT+03:00 Samuel Just sj...@redhat.com:

 I think I found the bug -- need to whiteout the snapset (or decache
 it) upon evict.

 http://tracker.ceph.com/issues/12748
 -Sam

 On Fri, Aug 21, 2015 at 8:04 AM, Ilya Dryomov idryo...@gmail.com wrote:
  On Fri, Aug 21, 2015 at 5:59 PM, Samuel Just sj...@redhat.com wrote:
  Odd, did you happen to capture osd logs?
 
  No, but the reproducer is trivial to cut  paste.
 
  Thanks,
 
  Ilya

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't mount Cephfs

2015-08-26 Thread Jan Schermer
If you lost 3 disks with size 2 and at least 2 of those disks were in different 
host, that means you lost data with the default CRUSH.
There's nothing you can do but either get those disks back in or recover from 
backup.

Jan

 On 26 Aug 2015, at 12:18, Andrzej Łukawski alukaw...@interia.pl wrote:
 
 Hi,
 
 We have ceph cluster (Ceph version 0.94.2) which consists of four nodes with 
 four disks on each node. Ceph is configured to hold two replicas (size 2). We 
 use this cluster for ceph filesystem. Few days ago we had power outage after 
 which I had to replace three of our cluster OSD disks. All OSD disks are now 
 online, but I'm unable to mount filesystem and constantly receive 'mount 
 error 5 = Input/output error'.  Ceph status shows many 'incomplete' pgs and 
 that 'mds cluster is degraded'. According to 'ceph health detail' mds is 
 replaying journal. 
 
 [root@cnode0 ceph]# ceph -s
 cluster 39c717a3-5e15-4e5e-bc54-7e7f1fd0ee24
  health HEALTH_WARN
 25 pgs backfill_toofull
 10 pgs degraded
 126 pgs down
 263 pgs incomplete
 54 pgs stale
 10 pgs stuck degraded
 263 pgs stuck inactive
 54 pgs stuck stale
 289 pgs stuck unclean
 10 pgs stuck undersized
 10 pgs undersized
 4 requests are blocked  32 sec
 recovery 27139/10407227 objects degraded (0.261%)
 recovery 168597/10407227 objects misplaced (1.620%)
 4 near full osd(s)
 too many PGs per OSD (312  max 300)
 mds cluster is degraded
  monmap e6: 6 mons at 
 {0=x.x.70.1:6789/0,0m=x.x.71.1:6789/0,1=x.x.70.2:6789/0,1m=x.x.71.2:6789/0,2=x.x.70.3:6789/0,2m=x.x.71.3:6789/0}
 election epoch 2958, quorum 0,1,2,3,4,5 0,1,2,0m,1m,2m
  mdsmap e1236: 1/1/1 up {0=2=up:replay}, 2 up:standby
  osdmap e83705: 16 osds: 16 up, 16 in; 26 remapped pgs
   pgmap v40869228: 2496 pgs, 3 pools, 16952 GB data, 5046 kobjects
 32825 GB used, 11698 GB / 44524 GB avail
 27139/10407227 objects degraded (0.261%)
 168597/10407227 objects misplaced (1.620%)
 2153 active+clean
  137 incomplete
  126 down+incomplete
   54 stale+active+clean
   15 active+remapped+backfill_toofull
   10 active+undersized+degraded+remapped+backfill_toofull
1 active+remapped
 [root@cnode0 ceph]#
 
 I wasn't able to find any solution in the Internet and I worry I will make 
 things even worse when continue to troubleshoot this on my own. I'm stuck. 
 Could you please help?
 
 Thanks.
 Andrzej
 
 
 
 
 
 
   
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Unexpected AIO Error

2015-08-26 Thread Pontus Lindgren
Hello, 

I am experiencing an issue where OSD Services fail due to an unexpected aio 
error. This has happend on two different OSD servers killing two different OSD 
Daemons services.

I am running Ceph Hammer on Debian Wheezy with a backported 
Kernel(3.16.0-0.bpo.4-amd64). 

Below is the log from one of the crashes.

I am wondering if anyone else has experienced this issue and might be able to 
point out some troubleshooting steps? so far all i’ve found are similar issues 
on the ceph bug tracker. I have posted my case to that as well.

 2015-08-16 08:11:54.227567 7f13d68de700  0 log_channel(cluster) log [WRN] : 3 
slow requests, 3 included below; oldest blocked for  30.685081 secs
 2015-08-16 08:11:54.227579 7f13d68de700  0 log_channel(cluster) log [WRN] : 
slow request 30.685081 seconds old, received at 2015-08-16 08:11:23.542417: 
osd_op(client.1109461.0:219374023 rbd_data.10e67e79e2a9e3.0001c201 
[stat,set-alloc-hint object_size 4194304 write_size 4194304,write 2592768~4096] 
5.89587894 ack+ondisk+write e1804) currently waiting for subops from 1,30
 2015-08-16 08:11:54.227587 7f13d68de700  0 log_channel(cluster) log [WRN] : 
slow request 30.682262 seconds old, received at 2015-08-16 08:11:23.545236: 
osd_repop(client.1109461.0:219374083 5.c63 
d6b85c63/rbd_data.10e67e79e2a9e3.0001a800/head//5 v 1804'121436) 
currently started
 2015-08-16 08:11:54.227592 7f13d68de700  0 log_channel(cluster) log [WRN] : 
slow request 30.641702 seconds old, received at 2015-08-16 08:11:23.585797: 
osd_repop(client.1935041.0:1302764 5.82a 
4219482a/rbd_data.1d685c2eb141f2.3c5f/head//5 v 1804'265055) 
currently started
 2015-08-16 08:11:55.227784 7f13d68de700  0 log_channel(cluster) log [WRN] : 4 
slow requests, 1 included below; oldest blocked for  31.685317 secs
 2015-08-16 08:11:55.227808 7f13d68de700  0 log_channel(cluster) log [WRN] : 
slow request 30.788521 seconds old, received at 2015-08-16 08:11:24.439213: 
osd_repop(client.1224667.0:34531998 5.abe 
2f457abe/rbd_data.12aacc79e2a9e3.1d9d/head//5 v 1804'27936) 
currently started
 2015-08-16 08:11:56.075649 7f13d3d89700 -1 journal aio to 7994220544~8192 
wrote 18446744073709551611
 2015-08-16 08:11:56.091460 7f13d3d89700 -1 os/FileJournal.cc: In function 
'void FileJournal::write_finish_thread_entry()' thread 7f13d3d89700 time 
2015-08-16 08:11:56.076462
 os/FileJournal.cc: 1426: FAILED assert(0 == unexpected aio error)
 
 ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x72) 
[0xcdb572]
 2: (FileJournal::write_finish_thread_entry()+0x847) [0xb9a437]
 3: (FileJournal::WriteFinisher::entry()+0xd) [0xa3befd]
 4: (()+0x6b50) [0x7f13de90ab50]
 5: (clone()+0x6d) [0x7f13dd32695d]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.


 
 
 

Pontus Lindgren
System Engineer
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW - multiple dns names

2015-08-26 Thread Luis Periquito
On Mon, Feb 23, 2015 at 10:18 PM, Yehuda Sadeh-Weinraub yeh...@redhat.com
wrote:



 --

 *From: *Shinji Nakamoto shinji.nakam...@mgo.com
 *To: *ceph-us...@ceph.com
 *Sent: *Friday, February 20, 2015 3:58:39 PM
 *Subject: *[ceph-users] RadosGW - multiple dns names

 We have multiple interfaces on our Rados gateway node, each of which is
 assigned to one of our many VLANs with a unique IP address.

 Is it possible to set multiple DNS names for a single Rados GW, so it can
 handle the request to each of the VLAN specific IP address DNS names?


 Not yet, however, the upcoming hammer release will support that (hostnames
 will be configured as part of the region).


I tested this using Hammer ( 0.94.2) and it doesn't seem to work. I'm just
adding multiple rgw dns name lines to the configuration. Did it make
Hammer, or am I doing it the wrong way? I couldn't find any docs either
way...



 Yehuda


 eg.
 rgw dns name = prd-apiceph001
 rgw dns name = prd-backendceph001
 etc.




 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't mount Cephfs

2015-08-26 Thread Andrzej Łukawski
Thank you for answer. I lost 2 disks on 1st node and 1 disk on 2nd. I 
understand it is not possible to recover the data even partially? 
Unfortunatelly those disks are lost forever.


Andrzej

W dniu 2015-08-26 o 12:26, Jan Schermer pisze:
If you lost 3 disks with size 2 and at least 2 of those disks were in 
different host, that means you lost data with the default CRUSH.
There's nothing you can do but either get those disks back in or 
recover from backup.


Jan

On 26 Aug 2015, at 12:18, Andrzej Łukawski alukaw...@interia.pl 
mailto:alukaw...@interia.pl wrote:


Hi,

We have ceph cluster (Ceph version 0.94.2) which consists of four 
nodes with four disks on each node. Ceph is configured to hold two 
replicas (size 2). We use this cluster for ceph filesystem. Few days 
ago we had power outage after which I had to replace three of our 
cluster OSD disks. All OSD disks are now online, but I'm unable to 
mount filesystem and constantly receive 'mount error 5 = Input/output 
error'.  Ceph status shows many 'incomplete' pgs and that 'mds 
cluster is degraded'. According to 'ceph health detail' mds is 
replaying journal.


[root@cnode0 ceph]# ceph -s
cluster 39c717a3-5e15-4e5e-bc54-7e7f1fd0ee24
 health HEALTH_WARN
25 pgs backfill_toofull
10 pgs degraded
126 pgs down
263 pgs incomplete
54 pgs stale
10 pgs stuck degraded
263 pgs stuck inactive
54 pgs stuck stale
289 pgs stuck unclean
10 pgs stuck undersized
10 pgs undersized
4 requests are blocked  32 sec
recovery 27139/10407227 objects degraded (0.261%)
recovery 168597/10407227 objects misplaced (1.620%)
4 near full osd(s)
too many PGs per OSD (312  max 300)
*mds cluster is degraded*
 monmap e6: 6 mons at 
{0=x.x.70.1:6789/0,0m=x.x.71.1:6789/0,1=x.x.70.2:6789/0,1m=x.x.71.2:6789/0,2=x.x.70.3:6789/0,2m=x.x.71.3:6789/0}

election epoch 2958, quorum 0,1,2,3,4,5 0,1,2,0m,1m,2m
 mdsmap e1236: 1/1/1 up {0=2=up:*replay*}, 2 up:standby
 osdmap e83705: 16 osds: 16 up, 16 in; 26 remapped pgs
  pgmap v40869228: 2496 pgs, 3 pools, 16952 GB data, 5046 kobjects
32825 GB used, 11698 GB / 44524 GB avail
27139/10407227 objects degraded (0.261%)
168597/10407227 objects misplaced (1.620%)
2153 active+clean
 137 incomplete
 126 down+incomplete
  54 stale+active+clean
  15 active+remapped+backfill_toofull
  10 active+undersized+degraded+remapped+backfill_toofull
   1 active+remapped
[root@cnode0 ceph]#

I wasn't able to find any solution in the Internet and I worry I will 
make things even worse when continue to troubleshoot this on my own. 
I'm stuck. Could you please help?


Thanks.
Andrzej







___
ceph-users mailing list
ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrating data into a newer ceph instance

2015-08-26 Thread Chang, Fangzhe (Fangzhe)
Thanks, Luis.

The motivation for using the newer version is to keep up-to-date with Ceph 
development, since we suspect the old versioned radosgw could not be restarted 
possibly due to library mismatch.
Do you know whether the self-healing feature of ceph is applicable between 
different versions or not?

Fangzhe

From: Luis Periquito [mailto:periqu...@gmail.com]
Sent: Wednesday, August 26, 2015 10:11 AM
To: Chang, Fangzhe (Fangzhe)
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Migrating data into a newer ceph instance

I Would say the easiest way would be to leverage all the self-healing of ceph: 
add the new nodes to the old cluster, allow or force all the data to migrate 
between nodes, and then remove the old ones out.

Well to be fair you could probably just install radosgw on another node and use 
it as your gateway without the need to even create a new OSD node...

Or was there a reason to create a new cluster? I can tell you that one of the 
clusters I have has been around since bobtail, and now it's hammer...

On Wed, Aug 26, 2015 at 2:50 PM, Chang, Fangzhe (Fangzhe) 
fangzhe.ch...@alcatel-lucent.commailto:fangzhe.ch...@alcatel-lucent.com 
wrote:
Hi,

We have been running Ceph/Radosgw version 0.80.7 (Giant) and stored quite some 
amount of data in it. We are only using ceph as an object store via radosgw. 
Last week cheph-radosgw daemon suddenly refused to start (with logs only 
showing “initialization timeout” error on Centos 7).  This triggers me to 
install a newer instance --- Ceph/Radosgw version 0.94.2 (Hammer). The new 
instance has a different set of key rings by default. The next step is to have 
all the data migrated. Does anyone know how to get the existing data out of the 
old ceph  cluster (Giant) and into the new instance (Hammer)? Please note that 
in the old three-node cluster ceph osd is still running but radosgw is not. Any 
suggestion will be greatly appreciated.
Thanks.

Regards,

Fangzhe Chang




___
ceph-users mailing list
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph monitoring with graphite

2015-08-26 Thread Dan van der Ster
Hi Wido,

On Wed, Aug 26, 2015 at 10:36 AM, Wido den Hollander w...@42on.com wrote:
 I'm sending pool statistics to Graphite

We're doing the same -- stripping invalid chars as needed -- and I
would guess that lots of people have written similar json2graphite
convertor scripts for Ceph monitoring in the recent months.

It makes me wonder if it might be useful if Ceph had a --format mode
to output df/stats/perf commands directly in graphite compatible text.
Shouldn't be too difficult to write.

Cheers, Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't mount Cephfs

2015-08-26 Thread Jan Schermer
Most of the data is still here, but you won't be able to just mount it if 
it's inconsistent.

I don't use CephFS so someone else could tell you if it's able to repair the 
filesystem with some parts missing.

You lost part of the data where the copies were only on the 1 disk in one node 
and on either of the disks on the other node since no other copy exists. How 
much data you lost I don't exactly know, but since you only have 16 OSDs I'm 
afraid it will be in the order of  ~3% probably? How many files are intact is 
a different question - it could be that every file is missing 3% of contents 
which would make the loss total.

Guys? I have no idea how files map to pgs and object in CephFS...

Jan


 On 26 Aug 2015, at 14:44, Andrzej Łukawski alukaw...@interia.pl wrote:
 
 Thank you for answer. I lost 2 disks on 1st node and 1 disk on 2nd. I 
 understand it is not possible to recover the data even partially? 
 Unfortunatelly those disks are lost forever.
 
 Andrzej
 
 W dniu 2015-08-26 o 12:26, Jan Schermer pisze:
 If you lost 3 disks with size 2 and at least 2 of those disks were in 
 different host, that means you lost data with the default CRUSH.
 There's nothing you can do but either get those disks back in or recover 
 from backup.
 
 Jan
 
 On 26 Aug 2015, at 12:18, Andrzej Łukawski alukaw...@interia.pl 
 mailto:alukaw...@interia.pl wrote:
 
 Hi,
 
 We have ceph cluster (Ceph version 0.94.2) which consists of four nodes 
 with four disks on each node. Ceph is configured to hold two replicas (size 
 2). We use this cluster for ceph filesystem. Few days ago we had power 
 outage after which I had to replace three of our cluster OSD disks. All OSD 
 disks are now online, but I'm unable to mount filesystem and constantly 
 receive 'mount error 5 = Input/output error'.  Ceph status shows many 
 'incomplete' pgs and that 'mds cluster is degraded'. According to 'ceph 
 health detail' mds is replaying journal. 
 
 [root@cnode0 ceph]# ceph -s
 cluster 39c717a3-5e15-4e5e-bc54-7e7f1fd0ee24
  health HEALTH_WARN
 25 pgs backfill_toofull
 10 pgs degraded
 126 pgs down
 263 pgs incomplete
 54 pgs stale
 10 pgs stuck degraded
 263 pgs stuck inactive
 54 pgs stuck stale
 289 pgs stuck unclean
 10 pgs stuck undersized
 10 pgs undersized
 4 requests are blocked  32 sec
 recovery 27139/10407227 objects degraded (0.261%)
 recovery 168597/10407227 objects misplaced (1.620%)
 4 near full osd(s)
 too many PGs per OSD (312  max 300)
 mds cluster is degraded
  monmap e6: 6 mons at 
 {0=x.x.70.1:6789/0,0m=x.x.71.1:6789/0,1=x.x.70.2:6789/0,1m=x.x.71.2:6789/0,2=x.x.70.3:6789/0,2m=x.x.71.3:6789/0}
 election epoch 2958, quorum 0,1,2,3,4,5 0,1,2,0m,1m,2m
  mdsmap e1236: 1/1/1 up {0=2=up:replay}, 2 up:standby
  osdmap e83705: 16 osds: 16 up, 16 in; 26 remapped pgs
   pgmap v40869228: 2496 pgs, 3 pools, 16952 GB data, 5046 kobjects
 32825 GB used, 11698 GB / 44524 GB avail
 27139/10407227 objects degraded (0.261%)
 168597/10407227 objects misplaced (1.620%)
 2153 active+clean
  137 incomplete
  126 down+incomplete
   54 stale+active+clean
   15 active+remapped+backfill_toofull
   10 active+undersized+degraded+remapped+backfill_toofull
1 active+remapped
 [root@cnode0 ceph]#
 
 I wasn't able to find any solution in the Internet and I worry I will make 
 things even worse when continue to troubleshoot this on my own. I'm stuck. 
 Could you please help?
 
 Thanks.
 Andrzej
 
 
 
 
 
 
   
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Migrating data into a newer ceph instance

2015-08-26 Thread Chang, Fangzhe (Fangzhe)
Hi,

We have been running Ceph/Radosgw version 0.80.7 (Giant) and stored quite some 
amount of data in it. We are only using ceph as an object store via radosgw. 
Last week cheph-radosgw daemon suddenly refused to start (with logs only 
showing initialization timeout error on Centos 7).  This triggers me to 
install a newer instance --- Ceph/Radosgw version 0.94.2 (Hammer). The new 
instance has a different set of key rings by default. The next step is to have 
all the data migrated. Does anyone know how to get the existing data out of the 
old ceph  cluster (Giant) and into the new instance (Hammer)? Please note that 
in the old three-node cluster ceph osd is still running but radosgw is not. Any 
suggestion will be greatly appreciated.
Thanks.

Regards,

Fangzhe Chang



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-08-26 Thread 10 minus
Hi ,

We got a good deal on 843T and we are using it in our Openstack setup ..as
journals .
They have been running for last six months ... No issues .
When we compared with  Intel SSDs I think it was 3700 they  were shade
slower for our workload and considerably cheaper.
We did not run any synthetic benchmark since we had a specific use case.
The performance was better than our old setup so it was good enough.

hth



On Tue, Aug 25, 2015 at 12:07 PM, Andrija Panic andrija.pa...@gmail.com
wrote:

 We have some 850 pro 256gb ssds if anyone interested to buy:)

 And also there was new 850 pro firmware that broke peoples disk which was
 revoked later etc... I'm sticking with only vacuum cleaners from Samsung
 for now, maybe... :)
 On Aug 25, 2015 12:02 PM, Voloshanenko Igor igor.voloshane...@gmail.com
 wrote:

 To be honest, Samsung 850 PRO not 24/7 series... it's something about
 desktop+ series, but anyway - results from this drives - very very bad in
 any scenario acceptable by real life...

 Possible 845 PRO more better, but we don't want to experiment anymore...
 So we choose S3500 240G. Yes, it's cheaper than S3700 (about 2x times), and
 no so durable for writes, but we think more better to replace 1 ssd per 1
 year than to pay double price now.

 2015-08-25 12:59 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 And should I mention that in another CEPH installation we had samsung
 850 pro 128GB and all of 6 ssds died in 2 month period - simply disappear
 from the system, so not wear out...

 Never again we buy Samsung :)
 On Aug 25, 2015 11:57 AM, Andrija Panic andrija.pa...@gmail.com
 wrote:

 First read please:

 http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

 We are getting 200 IOPS in comparison to Intels3500 18.000 iops - those
 are  constant performance numbers, meaning avoiding drives cache and
 running for longer period of time...
 Also if checking with FIO you will get better latencies on intel s3500
 (model tested in our case) along with 20X better IOPS results...

 We observed original issue by having high speed at begining of i.e.
 file transfer inside VM, which than halts to zero... We moved journals back
 to HDDs and performans was acceptable...no we are upgrading to intel
 S3500...

 Best
 any details on that ?

 On Tue, 25 Aug 2015 11:42:47 +0200, Andrija Panic
 andrija.pa...@gmail.com wrote:

  Make sure you test what ever you decide. We just learned this the
 hard way
  with samsung 850 pro, which is total crap, more than you could
 imagine...
 
  Andrija
  On Aug 25, 2015 11:25 AM, Jan Schermer j...@schermer.cz wrote:
 
   I would recommend Samsung 845 DC PRO (not EVO, not just PRO).
   Very cheap, better than Intel 3610 for sure (and I think it beats
 even
   3700).
  
   Jan
  
On 25 Aug 2015, at 11:23, Christopher Kunz chrisl...@de-punkt.de
 
   wrote:
   
Am 25.08.15 um 11:18 schrieb Götz Reinicke - IT Koordinator:
Hi,
   
most of the times I do get the recommendation from resellers to
 go with
the intel s3700 for the journalling.
   
Check out the Intel s3610. 3 drive writes per day for 5 years.
 Plus, it
is cheaper than S3700.
   
Regards,
   
--ck
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  



 --
 Mariusz Gronczewski, Administrator

 Efigence S. A.
 ul. Wołoska 9a, 02-583 Warszawa
 T: [+48] 22 380 13 13
 F: [+48] 22 380 13 14
 E: mariusz.gronczew...@efigence.com
 mailto:mariusz.gronczew...@efigence.com


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrating data into a newer ceph instance

2015-08-26 Thread Luis Periquito
I Would say the easiest way would be to leverage all the self-healing of
ceph: add the new nodes to the old cluster, allow or force all the data to
migrate between nodes, and then remove the old ones out.

Well to be fair you could probably just install radosgw on another node and
use it as your gateway without the need to even create a new OSD node...

Or was there a reason to create a new cluster? I can tell you that one of
the clusters I have has been around since bobtail, and now it's hammer...

On Wed, Aug 26, 2015 at 2:50 PM, Chang, Fangzhe (Fangzhe) 
fangzhe.ch...@alcatel-lucent.com wrote:

 Hi,



 We have been running Ceph/Radosgw version 0.80.7 (Giant) and stored quite
 some amount of data in it. We are only using ceph as an object store via
 radosgw. Last week cheph-radosgw daemon suddenly refused to start (with
 logs only showing “initialization timeout” error on Centos 7).  This
 triggers me to install a newer instance --- Ceph/Radosgw version 0.94.2
 (Hammer). The new instance has a different set of key rings by default. The
 next step is to have all the data migrated. Does anyone know how to get the
 existing data out of the old ceph  cluster (Giant) and into the new
 instance (Hammer)? Please note that in the old three-node cluster ceph osd
 is still running but radosgw is not. Any suggestion will be greatly
 appreciated.

 Thanks.



 Regards,



 Fangzhe Chang







 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph monitoring with graphite

2015-08-26 Thread Quentin Hartman
That would certainly be something we would use.

QH

On Wed, Aug 26, 2015 at 8:33 AM, Dan van der Ster d...@vanderster.com
wrote:

 Hi Wido,

 On Wed, Aug 26, 2015 at 10:36 AM, Wido den Hollander w...@42on.com
 wrote:
  I'm sending pool statistics to Graphite

 We're doing the same -- stripping invalid chars as needed -- and I
 would guess that lots of people have written similar json2graphite
 convertor scripts for Ceph monitoring in the recent months.

 It makes me wonder if it might be useful if Ceph had a --format mode
 to output df/stats/perf commands directly in graphite compatible text.
 Shouldn't be too difficult to write.

 Cheers, Dan
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrating data into a newer ceph instance

2015-08-26 Thread Luis Periquito
I tend to not do too much each time: either upgrade or data migrate. The
actual upgrade process is seamless... So you can just as easily upgrade the
current cluster to hammer, and add/remove nodes on the fly. All of this
is quite seamless and straightforward (other than the data migration
itself).

On Wed, Aug 26, 2015 at 3:17 PM, Chang, Fangzhe (Fangzhe) 
fangzhe.ch...@alcatel-lucent.com wrote:

 Thanks, Luis.



 The motivation for using the newer version is to keep up-to-date with Ceph
 development, since we suspect the old versioned radosgw could not be
 restarted possibly due to library mismatch.

 Do you know whether the self-healing feature of ceph is applicable between
 different versions or not?



 Fangzhe



 *From:* Luis Periquito [mailto:periqu...@gmail.com]
 *Sent:* Wednesday, August 26, 2015 10:11 AM
 *To:* Chang, Fangzhe (Fangzhe)
 *Cc:* ceph-users@lists.ceph.com
 *Subject:* Re: [ceph-users] Migrating data into a newer ceph instance



 I Would say the easiest way would be to leverage all the self-healing of
 ceph: add the new nodes to the old cluster, allow or force all the data to
 migrate between nodes, and then remove the old ones out.



 Well to be fair you could probably just install radosgw on another node
 and use it as your gateway without the need to even create a new OSD node...



 Or was there a reason to create a new cluster? I can tell you that one of
 the clusters I have has been around since bobtail, and now it's hammer...



 On Wed, Aug 26, 2015 at 2:50 PM, Chang, Fangzhe (Fangzhe) 
 fangzhe.ch...@alcatel-lucent.com wrote:

 Hi,



 We have been running Ceph/Radosgw version 0.80.7 (Giant) and stored quite
 some amount of data in it. We are only using ceph as an object store via
 radosgw. Last week cheph-radosgw daemon suddenly refused to start (with
 logs only showing “initialization timeout” error on Centos 7).  This
 triggers me to install a newer instance --- Ceph/Radosgw version 0.94.2
 (Hammer). The new instance has a different set of key rings by default. The
 next step is to have all the data migrated. Does anyone know how to get the
 existing data out of the old ceph  cluster (Giant) and into the new
 instance (Hammer)? Please note that in the old three-node cluster ceph osd
 is still running but radosgw is not. Any suggestion will be greatly
 appreciated.

 Thanks.



 Regards,



 Fangzhe Chang








 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unexpected AIO Error

2015-08-26 Thread Shinobu
Did you update:

http://tracker.ceph.com/issues/12100

Just question.

 Shinobu

On Wed, Aug 26, 2015 at 8:09 PM, Pontus Lindgren pon...@oderland.se wrote:

 Hello,

 I am experiencing an issue where OSD Services fail due to an unexpected
 aio error. This has happend on two different OSD servers killing two
 different OSD Daemons services.

 I am running Ceph Hammer on Debian Wheezy with a backported
 Kernel(3.16.0-0.bpo.4-amd64).

 Below is the log from one of the crashes.

 I am wondering if anyone else has experienced this issue and might be able
 to point out some troubleshooting steps? so far all i’ve found are similar
 issues on the ceph bug tracker. I have posted my case to that as well.

  2015-08-16 08:11:54.227567 7f13d68de700  0 log_channel(cluster) log [WRN]
 : 3 slow requests, 3 included below; oldest blocked for  30.685081 secs
  2015-08-16 08:11:54.227579 7f13d68de700  0 log_channel(cluster) log [WRN]
 : slow request 30.685081 seconds old, received at 2015-08-16
 08:11:23.542417: osd_op(client.1109461.0:219374023
 rbd_data.10e67e79e2a9e3.0001c201 [stat,set-alloc-hint object_size
 4194304 write_size 4194304,write 2592768~4096] 5.89587894 ack+ondisk+write
 e1804) currently waiting for subops from 1,30
  2015-08-16 08:11:54.227587 7f13d68de700  0 log_channel(cluster) log [WRN]
 : slow request 30.682262 seconds old, received at 2015-08-16
 08:11:23.545236: osd_repop(client.1109461.0:219374083 5.c63
 d6b85c63/rbd_data.10e67e79e2a9e3.0001a800/head//5 v 1804'121436)
 currently started
  2015-08-16 08:11:54.227592 7f13d68de700  0 log_channel(cluster) log [WRN]
 : slow request 30.641702 seconds old, received at 2015-08-16
 08:11:23.585797: osd_repop(client.1935041.0:1302764 5.82a
 4219482a/rbd_data.1d685c2eb141f2.3c5f/head//5 v 1804'265055)
 currently started
  2015-08-16 08:11:55.227784 7f13d68de700  0 log_channel(cluster) log [WRN]
 : 4 slow requests, 1 included below; oldest blocked for  31.685317 secs
  2015-08-16 08:11:55.227808 7f13d68de700  0 log_channel(cluster) log [WRN]
 : slow request 30.788521 seconds old, received at 2015-08-16
 08:11:24.439213: osd_repop(client.1224667.0:34531998 5.abe
 2f457abe/rbd_data.12aacc79e2a9e3.1d9d/head//5 v 1804'27936)
 currently started
  2015-08-16 08:11:56.075649 7f13d3d89700 -1 journal aio to 7994220544~8192
 wrote 18446744073709551611
  2015-08-16 08:11:56.091460 7f13d3d89700 -1 os/FileJournal.cc: In function
 'void FileJournal::write_finish_thread_entry()' thread 7f13d3d89700 time
 2015-08-16 08:11:56.076462
  os/FileJournal.cc: 1426: FAILED assert(0 == unexpected aio error)

  ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
 const*)+0x72) [0xcdb572]
  2: (FileJournal::write_finish_thread_entry()+0x847) [0xb9a437]
  3: (FileJournal::WriteFinisher::entry()+0xd) [0xa3befd]
  4: (()+0x6b50) [0x7f13de90ab50]
  5: (clone()+0x6d) [0x7f13dd32695d]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed
 to interpret this.






 Pontus Lindgren
 System Engineer


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
Email:
 shin...@linux.com
 ski...@redhat.com

 Life w/ Linux http://i-shinobu.hatenablog.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why are RGW pools all prefixed with a period (.)?

2015-08-26 Thread Gregory Farnum
On Wed, Aug 26, 2015 at 9:36 AM, Wido den Hollander w...@42on.com wrote:
 Hi,

 It's something which has been 'bugging' me for some time now. Why are
 RGW pools prefixed with a period?

 I tried setting the root pool to 'rgw.root', but RGW (0.94.1) refuses to
 start:

 ERROR: region root pool name must start with a period

 I'm sending pool statistics to Graphite and when sending a key like this
 you 'break' Graphite: ceph.pools.stats.pool_name.kb_read

 A pool like .rgw.root will break this since Graphite splits on periods.

 So is there any reason why this is? What's the reasoning behind it?

This might just be a leftover from when we were mapping buckets into
RADOS pools. Yehuda, is there some more current reason?
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can't mount Cephfs

2015-08-26 Thread Gregory Farnum
There is a cephfs-journal-tool that I believe is present in hammer and
ought to let you get your MDS through replay. Depending on which PGs
were lost you will have holes and/or missing files, in addition to not
being able to find parts of the directory hierarchy (and maybe getting
crashes if you access them). You can explore the options there and if
the documentation is sparse, feel free to ask questions...
-Greg

On Wed, Aug 26, 2015 at 1:44 PM, Andrzej Łukawski alukaw...@interia.pl wrote:
 Thank you for answer. I lost 2 disks on 1st node and 1 disk on 2nd. I
 understand it is not possible to recover the data even partially?
 Unfortunatelly those disks are lost forever.

 Andrzej

 W dniu 2015-08-26 o 12:26, Jan Schermer pisze:

 If you lost 3 disks with size 2 and at least 2 of those disks were in
 different host, that means you lost data with the default CRUSH.
 There's nothing you can do but either get those disks back in or recover
 from backup.

 Jan

 On 26 Aug 2015, at 12:18, Andrzej Łukawski alukaw...@interia.pl wrote:

 Hi,

 We have ceph cluster (Ceph version 0.94.2) which consists of four nodes with
 four disks on each node. Ceph is configured to hold two replicas (size 2).
 We use this cluster for ceph filesystem. Few days ago we had power outage
 after which I had to replace three of our cluster OSD disks. All OSD disks
 are now online, but I'm unable to mount filesystem and constantly receive
 'mount error 5 = Input/output error'.  Ceph status shows many 'incomplete'
 pgs and that 'mds cluster is degraded'. According to 'ceph health detail'
 mds is replaying journal.

 [root@cnode0 ceph]# ceph -s
 cluster 39c717a3-5e15-4e5e-bc54-7e7f1fd0ee24
  health HEALTH_WARN
 25 pgs backfill_toofull
 10 pgs degraded
 126 pgs down
 263 pgs incomplete
 54 pgs stale
 10 pgs stuck degraded
 263 pgs stuck inactive
 54 pgs stuck stale
 289 pgs stuck unclean
 10 pgs stuck undersized
 10 pgs undersized
 4 requests are blocked  32 sec
 recovery 27139/10407227 objects degraded (0.261%)
 recovery 168597/10407227 objects misplaced (1.620%)
 4 near full osd(s)
 too many PGs per OSD (312  max 300)
 mds cluster is degraded
  monmap e6: 6 mons at
 {0=x.x.70.1:6789/0,0m=x.x.71.1:6789/0,1=x.x.70.2:6789/0,1m=x.x.71.2:6789/0,2=x.x.70.3:6789/0,2m=x.x.71.3:6789/0}
 election epoch 2958, quorum 0,1,2,3,4,5 0,1,2,0m,1m,2m
  mdsmap e1236: 1/1/1 up {0=2=up:replay}, 2 up:standby
  osdmap e83705: 16 osds: 16 up, 16 in; 26 remapped pgs
   pgmap v40869228: 2496 pgs, 3 pools, 16952 GB data, 5046 kobjects
 32825 GB used, 11698 GB / 44524 GB avail
 27139/10407227 objects degraded (0.261%)
 168597/10407227 objects misplaced (1.620%)
 2153 active+clean
  137 incomplete
  126 down+incomplete
   54 stale+active+clean
   15 active+remapped+backfill_toofull
   10 active+undersized+degraded+remapped+backfill_toofull
1 active+remapped
 [root@cnode0 ceph]#

 I wasn't able to find any solution in the Internet and I worry I will make
 things even worse when continue to troubleshoot this on my own. I'm stuck.
 Could you please help?

 Thanks.
 Andrzej







 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] [ANN] ceph-deploy 1.5.28 released

2015-08-26 Thread Travis Rhoden
Hi everyone,

A new version of ceph-deploy has been released. Version 1.5.28
includes the following:

 - A fix for a regression introduced in 1.5.27 that prevented
importing GPG keys on CentOS 6 only.
 - Will prevent Ceph daemon deployment on nodes that don't have Ceph
installed on them.
 - Makes it possible to go from 1 monitor daemon to 2 without a 5
minute hang/delay.
 - More systemd enablement work.


Full changelog is at [1].

Updated packages have been uploaded to
{rpm,debian}-{firefly,hammer,testing} repos on ceph.com, and to PyPI.

Cheers,

 - Travis

[1] http://ceph.com/ceph-deploy/docs/changelog.html#id2
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rados: Undefined symbol error

2015-08-26 Thread Brad Hubbard
- Original Message -
 From: Aakanksha Pudipeddi-SSI aakanksha...@ssi.samsung.com
 To: Jason Dillaman dilla...@redhat.com
 Cc: ceph-us...@ceph.com
 Sent: Thursday, 27 August, 2015 6:22:45 AM
 Subject: Re: [ceph-users] Rados: Undefined symbol error
 
 Hello Jason,
 
 I checked the version of my built packages and they are all 9.0.2. I purged
 the cluster and uninstalled the packages and there seems to be nothing else
 - no older version. Could you elaborate on the fix for this issue?

Some thoughts...

# c++filt  _ZN5MutexC1ERKSsbbbP11CephContext
Mutex::Mutex(std::basic_stringchar, std::char_traitschar, 
std::allocatorchar  const, bool, bool, bool, CephContext*)

Thats from common/Mutex.cc

# nm --dynamic `which rados` 21|grep Mutex
00504da0 T _ZN5Mutex4LockEb
00504f70 T _ZN5Mutex6UnlockEv
00504a50 T _ZN5MutexC1EPKcbbbP11CephContext
00504a50 T _ZN5MutexC2EPKcbbbP11CephContext
00504d10 T _ZN5MutexD1Ev
00504d10 T _ZN5MutexD2Ev

This shows my version is defined in the text section of the binary itself. What
do you get when you run the above command?

Like Jason says this is some sort of mis-match between your rados binary and
your installed libs.

HTH,
Brad

 
 Thanks,
 Aakanksha
 
 -Original Message-
 From: Jason Dillaman [mailto:dilla...@redhat.com]
 Sent: Friday, August 21, 2015 6:37 AM
 To: Aakanksha Pudipeddi-SSI
 Cc: ceph-us...@ceph.com
 Subject: Re: [ceph-users] Rados: Undefined symbol error
 
 It sounds like you have rados CLI tool from an earlier Ceph release (
 Hammer) installed and it is attempting to use the librados shared library
 from a newer (= Hammer) version of Ceph.
 
 Jason
 
 
 - Original Message -
 
  From: Aakanksha Pudipeddi-SSI aakanksha...@ssi.samsung.com
  To: ceph-us...@ceph.com
  Sent: Thursday, August 20, 2015 11:47:26 PM
  Subject: [ceph-users] Rados: Undefined symbol error
 
  Hello,
 
  I cloned the master branch of Ceph and after setting up the cluster,
  when I tried to use the rados commands, I got this error:
 
  rados: symbol lookup error: rados: undefined symbol:
  _ZN5MutexC1ERKSsbbbP11CephContext
 
  I saw a similar post here: http://tracker.ceph.com/issues/12563 but I
  am not clear on the solution for this problem. I am not performing an
  upgrade here but the error seems to be similar. Could anybody shed
  more light on the issue and how to solve it? Thanks a lot!
 
  Aakanksha
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW - multiple dns names

2015-08-26 Thread Robin H. Johnson
On Wed, Aug 26, 2015 at 11:52:02AM +0100, Luis Periquito wrote:
 On Mon, Feb 23, 2015 at 10:18 PM, Yehuda Sadeh-Weinraub yeh...@redhat.com
 wrote:
 
 
 
  --
 
  *From: *Shinji Nakamoto shinji.nakam...@mgo.com
  *To: *ceph-us...@ceph.com
  *Sent: *Friday, February 20, 2015 3:58:39 PM
  *Subject: *[ceph-users] RadosGW - multiple dns names
 
  We have multiple interfaces on our Rados gateway node, each of which is
  assigned to one of our many VLANs with a unique IP address.
 
  Is it possible to set multiple DNS names for a single Rados GW, so it can
  handle the request to each of the VLAN specific IP address DNS names?
 
  Not yet, however, the upcoming hammer release will support that (hostnames
  will be configured as part of the region).
 
 
 I tested this using Hammer ( 0.94.2) and it doesn't seem to work. I'm just
 adding multiple rgw dns name lines to the configuration. Did it make
 Hammer, or am I doing it the wrong way? I couldn't find any docs either
 way...
http://ceph.com/docs/master/radosgw/config-ref/#get-a-region

Look at the hostname entry, which has a description of:
hostnames: A list of all the hostnames in the region. For example, you may use
multiple domain names to refer to the same region. Optional. The rgw dns name
setting will automatically be included in this list. You should restart the
radosgw daemon(s) after changing this setting.

Then you can stop using 'rgw dns name'.

What the functionality does NOT do, is allow you to require a specific hostname 
arrives
on a specific interface. All hostnames are valid for all interfaces/IPs. If you
want to restrict it, I'd suggest doing the validation in haproxy, in front of
civetweb.

-- 
Robin Hugh Johnson
Gentoo Linux: Developer, Infrastructure Lead
E-Mail : robb...@gentoo.org
GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rados bench object not correct errors on v9.0.3

2015-08-26 Thread Deneau, Tom

 -Original Message-
 From: Dałek, Piotr [mailto:piotr.da...@ts.fujitsu.com]
 Sent: Wednesday, August 26, 2015 2:02 AM
 To: Sage Weil; Deneau, Tom
 Cc: ceph-de...@vger.kernel.org; ceph-us...@ceph.com
 Subject: RE: rados bench object not correct errors on v9.0.3
 
  -Original Message-
  From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
  ow...@vger.kernel.org] On Behalf Of Sage Weil
  Sent: Tuesday, August 25, 2015 7:43 PM
 
   I have built rpms from the tarball http://ceph.com/download/ceph-
  9.0.3.tar.bz2.
   Have done this for fedora 21 x86_64 and for aarch64.  On both
   platforms when I run a single node cluster with a few osds and run
   rados bench read tests (either seq or rand) I get occasional reports
   like
  
   benchmark_data_myhost_20729_object73 is not correct!
  
   I never saw these with similar rpm builds on these platforms from
   9.0.2
  sources.
  
   Also, if I go to an x86-64 system running Ubuntu trusty for which I
   am able to install prebuilt binary packages via
   ceph-deploy install --dev v9.0.3
  
   I do not see the errors there.
 
  Hrm.. haven't seen it on this end, but we're running/testing master
  and not
  9.0.2 specifically.  If you can reproduce this on master, that'd be very
 helpful!
 
  There have been some recent changes to rados bench... Piotr, does this
  seem like it might be caused by your changes?
 
 Yes. My PR #4690 (https://github.com/ceph/ceph/pull/4690) caused rados bench
 to be fast enough to sometimes run into race condition between librados's AIO
 and objbencher processing. That was fixed in PR #5152
 (https://github.com/ceph/ceph/pull/5152) which didn't make it into 9.0.3.
 Tom, you can confirm this by inspecting the contents of objects questioned
 (their contents should be perfectly fine and I in line with other objects).
 In the meantime you can either apply patch from PR #5152 on your own or use -
 -no-verify.
 
 With best regards / Pozdrawiam
 Piotr Dałek

Piotr --

Thank you.  Yes, when I looked at the contents of the objects they always
looked correct.  And yes a single object would sometimes report an error
and sometimes not.  So a race condition makes sense.

A couple of questions:

   * Why would I not see this behavior using the pre-built 9.0.3 binaries
 that get installed using ceph-deploy install --dev v9.0.3?  I would 
assume
 this is built from the same sources as the 9.0.3 tarball.

   * So I assume one should not compare pre 9.0.3 rados bench numbers with 
9.0.3 and after?
 The pull request https://github.com/ceph/ceph/pull/4690 did not mention the
 effect on final bandwidth numbers, did you notice what that effect was?

-- Tom

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Question regarding degraded PGs

2015-08-26 Thread Goncalo Borges

Hey guys...

1./ I have a simple question regarding the appearance of degraded PGs. 
First, for reference:


   a. I am working with 0.94.2

   b. I have 32 OSDs distributed in 4 servers, meaning that I have 8
   OSD per server.

   c. Our cluster is set with 'osd pool default size = 3' and 'osd pool
   default min size = 2'


2./ I am testing the cluster in several disaster catastrophe scenarios, 
and I've deliberately powered down a storage server, with its 8 OSDs. At 
this point, everything went fine: during the night, the cluster 
performed all the recovery I/O, and in the morning, I got a 'HEALTH_OK' 
cluster running in only 3 servers and 24 OSDs.


3./ I've now powered up the missing server, and as expected, the cluster 
enters in  'HEALTH_WARN' and adjusts itself to the presence of one more 
server and 8 more populated OSDs.


4. However, what I do not understand is why during the former process, 
there are some PGs reported as degraded. Check the ' ceph -s' output 
next. As far as i understand, degraded PGs means that ceph has not 
replicated some objects in the placement group the correct number of 
times yet. This is actually not the case because, if we started from a 
'HEALTH_OK situation' it means all PGs are coherent. What does it 
happens under the cover when this new server (and its populated 8 OSDS) 
rejoins the cluster that triggers the existence of degraded PGs?


# ceph -s
cluster eea8578f-b3ac-4dfb-a0c5-da40509f5cdc
 health HEALTH_WARN
115 pgs backfill
121 pgs backfilling
513 pgs degraded
31 pgs recovering
309 pgs recovery_wait
513 pgs stuck degraded
576 pgs stuck unclean
recovery 198838/8567132 objects degraded (2.321%)
recovery 3267325/8567132 objects misplaced (38.138%)
 monmap e1: 3 mons at 
{mon1=X.X.X.X:6789/0,mon2=X.X.X.X.34:6789/0,mon3=X.X.X.X:6789/0}

election epoch 24, quorum 0,1,2 mon1,mon3,mon2
 mdsmap e162: 1/1/1 up {0=rccephmds=up:active}, 1 up:standby-replay
 osdmap e4764: 32 osds: 32 up, 32 in; 555 remapped pgs
  pgmap v1159567: 2176 pgs, 2 pools, 6515 GB data, 2240 kobjects
22819 GB used, 66232 GB / 89051 GB avail
198838/8567132 objects degraded (2.321%)
3267325/8567132 objects misplaced (38.138%)
1600 active+clean
 292 active+recovery_wait+degraded+remapped
 113 active+degraded+remapped+backfilling
  60 active+degraded+remapped+wait_backfill
  55 active+remapped+wait_backfill
  27 active+recovering+degraded+remapped
  17 active+recovery_wait+degraded
   8 active+remapped+backfilling
   4 active+recovering+degraded
recovery io 521 MB/s, 170 objects/s

Cheers
Goncalo

--
Goncalo Borges
Research Computing
ARC Centre of Excellence for Particle Physics at the Terascale
School of Physics A28 | University of Sydney, NSW  2006
T: +61 2 93511937

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Day Raleigh Cancelled

2015-08-26 Thread Matt Taylor
On that note in regards to watching /cephdays for details, the RSS feed 
404's!


http://ceph.com/cephdays/feed/

Regards,
Matt.

On 27/08/2015 02:52, Patrick McGarry wrote:

Yeah, we're still working on nailing down a venue for the Melbourne
event (but it looks like 05 Nov is probably the date). As soon as we
have a venue confirmed we'll put out a call for speakers and post the
details on the /cephdays/ page. Thanks!


On Tue, Aug 25, 2015 at 7:47 PM, Goncalo Borges
gonc...@physics.usyd.edu.au wrote:

Hey Patrick...2

I am interested in the Melbourne one. Under
http://ceph.com/cephdays/
I do not see any reference to it.

Can you give me more details on that?

TIA
Goncalo

On 08/26/2015 07:23 AM, Patrick McGarry wrote:


Due to low registration this event is being pushed back to next year.
The Ceph Day events for Shanghai, Tokyo, and Melbourne should all
still be proceeding as planned, however. Feel free to contact me if
you have any questions about Ceph Days. Thanks.




--
Goncalo Borges
Research Computing
ARC Centre of Excellence for Particle Physics at the Terascale
School of Physics A28 | University of Sydney, NSW  2006
T: +61 2 93511937






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] docker distribution

2015-08-26 Thread Lorieri
looks like it only works if an nginx is in front of radosgw, it
translates absolute URIs and maybe fix other issues:

https://github.com/docker/distribution/pull/808#issuecomment-135286314
https://github.com/docker/distribution/pull/902

On Mon, Aug 17, 2015 at 1:37 PM, Lorieri lori...@gmail.com wrote:
 Hi,

 Docker changed the old docker-registry project to docker-distribution
 and its API to v2.
 It now uses librados instead of radosgw to save data.

 In some ceph installations it is easier to get access to radosgw than
 to the cluster, so I've made a pull request to add radosgw support, it
 would be great if you test it.
 https://hub.docker.com/r/lorieri/docker-distribution-generic-s3/

 Note: if you already use the old docker-registry you must create
 another bucket and push the images again, the API changed to v2.

 There is a shellscript to help https://github.com/docker/migrator

 How I tested it:

 docker run -d -p 5000:5000 -e REGISTRY_STORAGE=s3 \
 -e REGISTRY_STORAGE_S3_REGION=generic \
 -e REGISTRY_STORAGE_S3_REGIONENDPOINT=http://myradosgw.mydomain.com; \
 -e REGISTRY_STORAGE_S3_BUCKET=registry \
 -e REGISTRY_STORAGE_S3_ACCESSKEY=XXX \
 -e REGISTRY_STORAGE_S3_SECRETKEY=XXX \
 -e REGISTRY_STORAGE_S3_SECURE=false \
 -e REGISTRY_STORAGE_S3_ENCRYPT=false \
 -e REGISTRY_STORAGE_S3_REGIONSUPPORTSHEAD=false \
 lorieri/docker-distribution-generic-s3


 thanks,
 -lorieri
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com