Re: [ceph-users] All pools have size=3 but MB data and MB used ratio is 1 to 5

2015-03-27 Thread Saverio Proto
 I will start now to push a lot of data into the cluster to see if the
 metadata grows a lot or stays costant.

 There is a way to clean up old metadata ?

I pushed a lot of more data to the cluster. Then I lead the cluster
sleep for the night.

This morning I find this values:

6841 MB data
25814 MB used

that is a bit more of 1 to 3.

It looks like the extra space is in these folders (for N from 1 to 36):

/var/lib/ceph/osd/ceph-N/current/meta/

This meta folders have a lot of data in it. I would really be happy
to have pointers to understand what is in there and how to clean that
up eventually.

The problem is that googling for ceph meta or ceph metadata will
produce results for Ceph MDS that is completely unrelated :(

thanks

Saverio
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Hammer release data and a Design question

2015-03-27 Thread Udo Lembke
Hi,

Am 26.03.2015 11:18, schrieb 10 minus:
 Hi ,
 
 I 'm just starting on small Ceph implementation and wanted to know the 
 release date for Hammer.
 Will it coincide with relase of Openstack.
 
 My Conf:  (using 10G and Jumboframes on Centos 7 / RHEL7 )
 
 3x Mons (VMs) :
 CPU - 2
 Memory - 4G
 Storage - 20 GB
 
 4x OSDs :
 CPU - Haswell Xeon
 Memory - 8 GB
 Sata - 3x 2TB (3 OSD per node)
 SSD - 2x 480 GB ( Journaling and if possible tiering)
 
 
 This is a test environment to see how all the components play . If all goes 
 well
 then we plan to increase the OSDs to 24 per node and RAM to 32 GB and a dual 
 Socket Haswell Xeons
32GB for 24 OSDs are much to less!! I have 32GB for 12 OSDs - that's ok, but 
64GB will be better.
CPU depends on you Model (Cores, DualSocket?).

Udo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: ceph-deploy : Certificate Error using wget on Debian

2015-03-27 Thread Vasilis Souleles
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello,

I'm trying to create a 4-node Ceph Storage Cluster using ceph-deploy,
following the official guide:
http://docs.ceph.com/docs/master/start/quick-ceph-deploy/

I'm using debian wheezy 7 (x86_64) on all nodes and on each node,
`uname -a` produces: Linux nodeX 3.2.0-4-amd64 #1 SMP Debian
3.2.65-1+deb7u2 x86_64 GNU/Linux.

I'm having trouble getting ceph-deploy to run. The ceph.log reads:
[ceph_deploy.conf][DEBUG ] found configuration file at:
/home/ceph/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.22): /usr/bin/ceph-deploy
install node0 node1 node2 node3
[ceph_deploy.install][DEBUG ] Installing stable version giant on
cluster ceph hosts node0 node1 node2 node3
[ceph_deploy.install][DEBUG ] Detecting platform for host node0 ...
[node0][DEBUG ] connection detected need for sudo
[node0][DEBUG ] connected to host: node0
[node0][DEBUG ] detect platform information from remote host
[node0][DEBUG ] detect machine type
[ceph_deploy.install][INFO  ] Distro info: debian 7.8 wheezy
[node0][INFO  ] installing ceph on node0
[node0][INFO  ] Running command: sudo env
DEBIAN_FRONTEND=noninteractive apt-get -q install --assume-yes
ca-certificates
[node0][DEBUG ] Reading package lists...
[node0][DEBUG ] Building dependency tree...
[node0][DEBUG ] Reading state information...
[node0][DEBUG ] ca-certificates is already the newest version.
[node0][DEBUG ] 0 upgraded, 0 newly installed, 0 to remove and 0 not
upgraded.
[node0][INFO  ] Running command: sudo wget -O release.asc
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
[node0][WARNING] --2015-03-27 13:24:50--
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
[node0][WARNING] Resolving ceph.com (ceph.com)... 208.113.241.137,
2607:f298:4:147::b05:fe2a
[node0][WARNING] Connecting to ceph.com
(ceph.com)|208.113.241.137|:443... connected.
[node0][WARNING] ERROR: The certificate of `ceph.com' is not trusted.
[node0][WARNING] ERROR: The certificate of `ceph.com' hasn't got a
known issuer.
[node0][WARNING] command returned non-zero exit status: 5
[node0][INFO  ] Running command: sudo apt-key add release.asc
[node0][WARNING] gpg: no valid OpenPGP data found.
[node0][ERROR ] RuntimeError: command returned non-zero exit status: 2
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: apt-key
add release.asc

So the problem lies with 'wget'. According to this thread (
http://www.linuxquestions.org/questions/debian-26/wget-certificate-error-4175495817/
), there is a difference between the ubuntu and the debian versions of
'wget', they seem to be compiled and linked against different libraries.

So on Debian the problem occurs, while on Ubuntu it does not. ( I did
try to run `wget -O release.asc
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc` on
Ubuntu 14.04 and it did finish successfully, while it fails on Debian ).

So, my question is, what is the proper way of dealing with this error?

As a workaround, I managed to get it to work (again, according to the
same thread linked before):
1) visit 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=key
/release.asc' using a browser, and export the certificate as ceph.pem
2) on every node  run `# apt-get install openssl ca-certificates`
3) copy (scp) ceph.pem to /usr/share/ca-certificates/ceph.pem on every
node
4) on every node run `# echo ceph.pem  /etc/ca-certificates.pem`
5) on every node run `# update-ca-certificates`

After these, I was able to run `wget -O release.asc
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc` on
every node, successfully.

Additionally, ceph-deploy seems to work fine now.

Thank you,
Vasilis
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIcBAEBAgAGBQJVFUj0AAoJENYedgpVEk4LRCMP/jNIKUwnexk6RQtQT71zPIub
WRxuHotl0QA3//DoOGYtXYfDJDAqKchwwGPmHxPCZCk1WA++B3ksGrHaszLx8kbd
eqOip4+6URctolOVE3YaruwJVSghp9cWlMNySejw2cLioJaz7N4tdLQhXacj8R5j
FR8ZD0p6OoKk7LgEGMJ6Qdal3GgQrEbdCQj0Zn8bHWYbURM1Yp1cnE4Ak2a8emGk
K+5nmA+MyMhJiystIC1p7d6eEge0IrHtToOmhfsJXuf6swGDX3bywmOthfctnVP9
sOQVYAc+yD++texujZ3ue8Sodl9QbaiLieRRDwiB/8wi7thTh2ibSoDRoSHlSIpf
+RXRorpW7auOMoBAtnYReVWXT/MfK5B6+Tad0XmlxCDkscBImgfPPRkUtFa06xpW
YKpSULtZMGAc9Cs3K9+e1wN9IIQ8mQ386m7WLjQWEvyHPVyI4rHdV1wvn0dYMhlk
7ndOCwZf6ZMrfjiCVdEY2Ygs+s+4zRCfEws72y3x3+JYtqMPUCtQ/LyzYem0HiJE
YfiDNxDJzgOrR2WmwqIss3RlseIv+h91wUrEvGpBty3uYm7LEExGeNkCykanxf7l
1SiNKxrV0F/PTfkE1vaUJbCXgAPtEhIDn71A5O/VK8SLp9E9DKuNQBNLQ6BYob+b
KY6zhoGEEj6LDabYIjEK
=E/Gz
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-03-27 Thread Mikaël Cluseau

Hi,

On 03/18/2015 03:01 PM, Gregory Farnum wrote:

I think it tended to crash rather than
hang like this so I'm a bit surprised, but if this op is touching a
broken file or something that could explain it.


FWIW, the last time I had the issue (on a 3.10.9 kernel), btrfs was 
freezing, waiting forever and even starting threads rising the host's 
load to numbers like 400. I can't say much more for now as I had to move 
to XFS. I will setup a lab someday to try btrfs again, has I heard it 
was more stable since 3.16.1, but I think I won't enable compression. 
BTW, it's really a filesystem I would like to use in production someday 
(as many I think ;)).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrating objects from one pool to another?

2015-03-27 Thread Karan Singh

Hi Jean

You would probably need this

ceph osd pool create glance-images-bkp 128 128
rados cppool glance-images glance-images-bkp
ceph osd pool rename glance-images glance-images-old
ceph osd pool rename glance-images-bkp glance-images
ceph osd pool delete glance-images-old glance-images-old 
--yes-i-really-really-mean-it  ( once you are sure data is moved 100% )

I would suggest to stop openstack services that are using the original pool , 
then copy the data , rename pools , finally start openstack services and check 
everything is there.

I have done this once with success.



Karan Singh 
Systems Specialist , Storage Platforms
CSC - IT Center for Science,
Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland
mobile: +358 503 812758
tel. +358 9 4572001
fax +358 9 4572302
http://www.csc.fi/


 On 27 Mar 2015, at 00:01, Gregory Farnum g...@gregs42.com wrote:
 
 On Thu, Mar 26, 2015 at 2:53 PM, Steffen W Sørensen ste...@me.com wrote:
 
 On 26/03/2015, at 21.07, J-P Methot jpmet...@gtcomm.net wrote:
 
 That's a great idea. I know I can setup cinder (the openstack volume 
 manager) as a multi-backend manager and migrate from one backend to the 
 other, each backend linking to different pools of the same ceph cluster. 
 What bugs me though is that I'm pretty sure the image store, glance, 
 wouldn't let me do that. Additionally, since the compute component also has 
 its own ceph pool, I'm pretty sure it won't let me migrate the data through 
 openstack.
 Hm wouldn’t it be possible to do something similar ala:
 
 # list object from src pool
 rados ls objects loop | filter-obj-id | while read obj; do
 # export $obj to local disk
 rados -p pool-wth-too-many-pgs get $obj
 # import $obj from local disk to new pool
 rados -p better-sized-pool put $obj
 done
 
 You would also have issues with snapshots if you do this on an RBD
 pool. That's unfortunately not feasible.
 -Greg
 
 
 
 possible split/partition list of objects into multiple concurrent loops, 
 possible from multiple boxes as seems fit for resources at hand, cpu, 
 memory, network, ceph perf.
 
 /Steffen
 
 
 
 
 On 3/26/2015 3:54 PM, Steffen W Sørensen wrote:
 On 26/03/2015, at 20.38, J-P Methot jpmet...@gtcomm.net wrote:
 
 Lately I've been going back to work on one of my first ceph setup and now 
 I see that I have created way too many placement groups for the pools on 
 that setup (about 10 000 too many). I believe this may impact 
 performances negatively, as the performances on this ceph cluster are 
 abysmal. Since it is not possible to reduce the number of PGs in a pool, 
 I was thinking of creating new pools with a smaller number of PGs, moving 
 the data from the old pools to the new pools and then deleting the old 
 pools.
 
 I haven't seen any command to copy objects from one pool to another. 
 Would that be possible? I'm using ceph for block storage with openstack, 
 so surely there must be a way to move block devices from a pool to 
 another, right?
 What I did a one point was going one layer higher in my storage 
 abstraction, and created new Ceph pools and used those for new storage 
 resources/pool in my VM env. (ProxMox) on top of Ceph RBD and then did a 
 live migration of virtual disks there, assume you could do the same in 
 OpenStack.
 
 My 0.02$
 
 /Steffen
 
 
 --
 ==
 Jean-Philippe Méthot
 Administrateur système / System administrator
 GloboTech Communications
 Phone: 1-514-907-0050
 Toll Free: 1-(888)-GTCOMM1
 Fax: 1-(514)-907-0750
 jpmet...@gtcomm.net
 http://www.gtcomm.net
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: ceph-deploy : Certificate Error using wget on Debian

2015-03-27 Thread Brian Rak
It looks like ceph.com is having some major issues with their git 
repository right now.. https://ceph.com/git/ gives a 500 error


On 3/27/2015 8:11 AM, Vasilis Souleles wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello,

I'm trying to create a 4-node Ceph Storage Cluster using ceph-deploy,
following the official guide:
http://docs.ceph.com/docs/master/start/quick-ceph-deploy/

I'm using debian wheezy 7 (x86_64) on all nodes and on each node,
`uname -a` produces: Linux nodeX 3.2.0-4-amd64 #1 SMP Debian
3.2.65-1+deb7u2 x86_64 GNU/Linux.

I'm having trouble getting ceph-deploy to run. The ceph.log reads:
[ceph_deploy.conf][DEBUG ] found configuration file at:
/home/ceph/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.22): /usr/bin/ceph-deploy
install node0 node1 node2 node3
[ceph_deploy.install][DEBUG ] Installing stable version giant on
cluster ceph hosts node0 node1 node2 node3
[ceph_deploy.install][DEBUG ] Detecting platform for host node0 ...
[node0][DEBUG ] connection detected need for sudo
[node0][DEBUG ] connected to host: node0
[node0][DEBUG ] detect platform information from remote host
[node0][DEBUG ] detect machine type
[ceph_deploy.install][INFO  ] Distro info: debian 7.8 wheezy
[node0][INFO  ] installing ceph on node0
[node0][INFO  ] Running command: sudo env
DEBIAN_FRONTEND=noninteractive apt-get -q install --assume-yes
ca-certificates
[node0][DEBUG ] Reading package lists...
[node0][DEBUG ] Building dependency tree...
[node0][DEBUG ] Reading state information...
[node0][DEBUG ] ca-certificates is already the newest version.
[node0][DEBUG ] 0 upgraded, 0 newly installed, 0 to remove and 0 not
upgraded.
[node0][INFO  ] Running command: sudo wget -O release.asc
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
[node0][WARNING] --2015-03-27 13:24:50--
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
[node0][WARNING] Resolving ceph.com (ceph.com)... 208.113.241.137,
2607:f298:4:147::b05:fe2a
[node0][WARNING] Connecting to ceph.com
(ceph.com)|208.113.241.137|:443... connected.
[node0][WARNING] ERROR: The certificate of `ceph.com' is not trusted.
[node0][WARNING] ERROR: The certificate of `ceph.com' hasn't got a
known issuer.
[node0][WARNING] command returned non-zero exit status: 5
[node0][INFO  ] Running command: sudo apt-key add release.asc
[node0][WARNING] gpg: no valid OpenPGP data found.
[node0][ERROR ] RuntimeError: command returned non-zero exit status: 2
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: apt-key
add release.asc

So the problem lies with 'wget'. According to this thread (
http://www.linuxquestions.org/questions/debian-26/wget-certificate-error-4175495817/
), there is a difference between the ubuntu and the debian versions of
'wget', they seem to be compiled and linked against different libraries.

So on Debian the problem occurs, while on Ubuntu it does not. ( I did
try to run `wget -O release.asc
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc` on
Ubuntu 14.04 and it did finish successfully, while it fails on Debian ).

So, my question is, what is the proper way of dealing with this error?

As a workaround, I managed to get it to work (again, according to the
same thread linked before):
1) visit 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=key
/release.asc' using a browser, and export the certificate as ceph.pem
2) on every node  run `# apt-get install openssl ca-certificates`
3) copy (scp) ceph.pem to /usr/share/ca-certificates/ceph.pem on every
node
4) on every node run `# echo ceph.pem  /etc/ca-certificates.pem`
5) on every node run `# update-ca-certificates`

After these, I was able to run `wget -O release.asc
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc` on
every node, successfully.

Additionally, ceph-deploy seems to work fine now.

Thank you,
Vasilis
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIcBAEBAgAGBQJVFUj0AAoJENYedgpVEk4LRCMP/jNIKUwnexk6RQtQT71zPIub
WRxuHotl0QA3//DoOGYtXYfDJDAqKchwwGPmHxPCZCk1WA++B3ksGrHaszLx8kbd
eqOip4+6URctolOVE3YaruwJVSghp9cWlMNySejw2cLioJaz7N4tdLQhXacj8R5j
FR8ZD0p6OoKk7LgEGMJ6Qdal3GgQrEbdCQj0Zn8bHWYbURM1Yp1cnE4Ak2a8emGk
K+5nmA+MyMhJiystIC1p7d6eEge0IrHtToOmhfsJXuf6swGDX3bywmOthfctnVP9
sOQVYAc+yD++texujZ3ue8Sodl9QbaiLieRRDwiB/8wi7thTh2ibSoDRoSHlSIpf
+RXRorpW7auOMoBAtnYReVWXT/MfK5B6+Tad0XmlxCDkscBImgfPPRkUtFa06xpW
YKpSULtZMGAc9Cs3K9+e1wN9IIQ8mQ386m7WLjQWEvyHPVyI4rHdV1wvn0dYMhlk
7ndOCwZf6ZMrfjiCVdEY2Ygs+s+4zRCfEws72y3x3+JYtqMPUCtQ/LyzYem0HiJE
YfiDNxDJzgOrR2WmwqIss3RlseIv+h91wUrEvGpBty3uYm7LEExGeNkCykanxf7l
1SiNKxrV0F/PTfkE1vaUJbCXgAPtEhIDn71A5O/VK8SLp9E9DKuNQBNLQ6BYob+b
KY6zhoGEEj6LDabYIjEK
=E/Gz
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Snapshots and fstrim with cache tiers ?

2015-03-27 Thread Frédéric Nass


Hello,

The snapshot with a cache tier part was answered by Greg Farnum 
(https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18329.html).


What about fstrim with a cache tier ? It doesn't seem to work.

Also is there a background task that recovers freed blocks ?

Best regards,

Frédéric.


Le 25/03/2015 11:14, Frédéric Nass a écrit :


Hello,


I have a few questions regarding snapshots and fstrim with cache tiers.


In the cache tier and erasure coding FAQ related to ICE 1.2 (based 
on Firefly), Inktank says Snapshots are not supported in conjunction 
with cache tiers.


What are the risks of using snapshots with cache tiers ? Would this 
better not use it recommandation still be true with Giant or Hammer ?



Regarding the fstrim command, it doesn't seem to work with cache 
tiers. The freed up blocks don't get back in the ceph cluster.
Can someone confirm this ? Is there something we can do to get those 
freed up blocks back in the cluster ?



Also, can we run an fstrim task from the cluster side ? That is, 
without having to map and mount each rbd image or rely on the client 
to operate this task ?



Best regards,


--

Frédéric Nass

Sous-direction Infrastructures
Direction du Numérique
Université de Lorraine

email : frederic.n...@univ-lorraine.fr
Tél : +33 3 83 68 53 83


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Frédéric Nass

Sous direction des Infrastructures,
Direction du Numérique,
Université de Lorraine.

Tél : 03.83.68.53.83

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph -s slow return result

2015-03-27 Thread Kobi Laredo
What's the current health of the cluster?
It may help to compact the monitors' LevelDB store if they have grown in
size
http://www.sebastien-han.fr/blog/2014/10/27/ceph-mon-store-taking-up-a-lot-of-space/
Depends on the size of the mon's store size it may take some time to
compact, make sure to do only one at a time.

*Kobi Laredo*
*Cloud Systems Engineer* | (*408) 409-KOBI*

On Fri, Mar 27, 2015 at 10:31 AM, Chu Duc Minh chu.ducm...@gmail.com
wrote:

 All my monitors running.
 But i deleting pool .rgw.buckets, now having 13 million objects (just test
 data).
 The reason that i must delete this pool is my cluster become unstable, and
 sometimes an OSD down, PG peering, incomplete,...
 Therefore i must delete this pool to re-stablize my cluster.  (radosgw is
 too slow for delete objects when one of my bucket reachs few million
 objects).

 Regards,


 On Sat, Mar 28, 2015 at 12:23 AM, Gregory Farnum g...@gregs42.com wrote:

 Are all your monitors running? Usually a temporary hang means that the
 Ceph client tries to reach a monitor that isn't up, then times out and
 contacts a different one.

 I have also seen it just be slow if the monitors are processing so many
 updates that they're behind, but that's usually on a very unhappy cluster.
 -Greg
 On Fri, Mar 27, 2015 at 8:50 AM Chu Duc Minh chu.ducm...@gmail.com
 wrote:

 On my CEPH cluster, ceph -s return result quite slow.
 Sometimes it return result immediately, sometimes i hang few seconds
 before return result.

 Do you think this problem (ceph -s slow return) only relate to
 ceph-mon(s) process? or maybe it relate to ceph-osd(s) too?
 (i deleting a big bucket, .rgw.buckets, and ceph-osd(s) disk util quite
 high)

 Regards,
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] monitor 0.87.1 crashes

2015-03-27 Thread samuel
Hi all,

In a fully functional ceph installation today we suffer a problem with ceph
monitors, that started crashing with following error:
include/interval_set.h: 340: FAILED assert(0)

Is there any related bug?

Thanks a lot in advance,
Samuel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Slow writes with 1MB files

2015-03-27 Thread Mark Nelson
Specifically related to BTRFS, if you have random IO to existing objects 
it will cause terrible fragmentation due to COW.  BTRFS is often faster 
than XFS initially but after it starts fragmenting can become much 
slower for sequential reads.  You may want to try XFS again and see if 
you can improve the read performance (increasing read ahead both on the 
cephfs client and on the underlying OSD block devices to something like 
4MB might help).


Mark

On 03/27/2015 11:47 AM, Barclay Jameson wrote:

Opps I should have said that I am not just writing the data but copying it :

time cp Small1/* Small2/*

Thanks,

BJ

On Fri, Mar 27, 2015 at 11:40 AM, Barclay Jameson
almightybe...@gmail.com wrote:

I did a Ceph cluster install 2 weeks ago where I was getting great
performance (~= PanFS) where I could write 100,000 1MB files in 61
Mins (Took PanFS 59 Mins). I thought I could increase the performance
by adding a better MDS server so I redid the entire build.

Now it takes 4 times as long to write the same data as it did before.
The only thing that changed was the MDS server. (I even tried moving
the MDS back on the old slower node and the performance was the same.)

The first install was on CentOS 7. I tried going down to CentOS 6.6
and it's the same results.
I use the same scripts to install the OSDs (which I created because I
can never get ceph-deploy to behave correctly. Although, I did use
ceph-deploy to create the MDS and MON and initial cluster creation.)

I use btrfs on the OSDS as I can get 734 MB/s write and 1100 MB/s read
with rados bench -p cephfs_data 500 write --no-cleanup  rados bench
-p cephfs_data 500 seq (xfs was 734 MB/s write but only 200 MB/s read)

Could anybody think of a reason as to why I am now getting a huge regression.

Hardware Setup:
[OSDs]
64 GB 2133 MHz
Dual Proc E5-2630 v3 @ 2.40GHz (16 Cores)
40Gb Mellanox NIC

[MDS/MON new]
128 GB 2133 MHz
Dual Proc E5-2650 v3 @ 2.30GHz (20 Cores)
40Gb Mellanox NIC

[MDS/MON old]
32 GB 800 MHz
Dual Proc E5472  @ 3.00GHz (8 Cores)
10Gb Intel NIC

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph -s slow return result

2015-03-27 Thread Gregory Farnum
Are all your monitors running? Usually a temporary hang means that the Ceph
client tries to reach a monitor that isn't up, then times out and contacts
a different one.

I have also seen it just be slow if the monitors are processing so many
updates that they're behind, but that's usually on a very unhappy cluster.
-Greg
On Fri, Mar 27, 2015 at 8:50 AM Chu Duc Minh chu.ducm...@gmail.com wrote:

 On my CEPH cluster, ceph -s return result quite slow.
 Sometimes it return result immediately, sometimes i hang few seconds
 before return result.

 Do you think this problem (ceph -s slow return) only relate to ceph-mon(s)
 process? or maybe it relate to ceph-osd(s) too?
 (i deleting a big bucket, .rgw.buckets, and ceph-osd(s) disk util quite
 high)

 Regards,
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph -s slow return result

2015-03-27 Thread Chu Duc Minh
All my monitors running.
But i deleting pool .rgw.buckets, now having 13 million objects (just test
data).
The reason that i must delete this pool is my cluster become unstable, and
sometimes an OSD down, PG peering, incomplete,...
Therefore i must delete this pool to re-stablize my cluster.  (radosgw is
too slow for delete objects when one of my bucket reachs few million
objects).

Regards,


On Sat, Mar 28, 2015 at 12:23 AM, Gregory Farnum g...@gregs42.com wrote:

 Are all your monitors running? Usually a temporary hang means that the
 Ceph client tries to reach a monitor that isn't up, then times out and
 contacts a different one.

 I have also seen it just be slow if the monitors are processing so many
 updates that they're behind, but that's usually on a very unhappy cluster.
 -Greg
 On Fri, Mar 27, 2015 at 8:50 AM Chu Duc Minh chu.ducm...@gmail.com
 wrote:

 On my CEPH cluster, ceph -s return result quite slow.
 Sometimes it return result immediately, sometimes i hang few seconds
 before return result.

 Do you think this problem (ceph -s slow return) only relate to
 ceph-mon(s) process? or maybe it relate to ceph-osd(s) too?
 (i deleting a big bucket, .rgw.buckets, and ceph-osd(s) disk util quite
 high)

 Regards,
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ERROR: missing keyring, cannot use cephx for authentication

2015-03-27 Thread Jesus Chavez (jeschave)
the thing is that the devices are not mounting after reboot…

Any ideas?

[cid:image005.png@01D00809.A6D502D0]


Jesus Chavez
SYSTEMS ENGINEER-C.SALES

jesch...@cisco.commailto:jesch...@cisco.com
Phone: +52 55 5267 3146
Mobile: +51 1 5538883255

CCIE - 44433


Cisco.comhttp://www.cisco.com/





[cid:image006.gif@01D00809.A6D502D0]



  Think before you print.

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.

Please click 
herehttp://www.cisco.com/web/about/doing_business/legal/cri/index.html for 
Company Registration Information.





On Mar 23, 2015, at 3:37 PM, Thomas Foster 
thomas.foste...@gmail.commailto:thomas.foste...@gmail.com wrote:

check your server where that osd is located and see if you have created the 
directory correctly.  If you didn't create it correctly you would get that 
error message.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph -s slow return result

2015-03-27 Thread Chu Duc Minh
On my CEPH cluster, ceph -s return result quite slow.
Sometimes it return result immediately, sometimes i hang few seconds before
return result.

Do you think this problem (ceph -s slow return) only relate to ceph-mon(s)
process? or maybe it relate to ceph-osd(s) too?
(i deleting a big bucket, .rgw.buckets, and ceph-osd(s) disk util quite
high)

Regards,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS Slow writes with 1MB files

2015-03-27 Thread Barclay Jameson
I did a Ceph cluster install 2 weeks ago where I was getting great
performance (~= PanFS) where I could write 100,000 1MB files in 61
Mins (Took PanFS 59 Mins). I thought I could increase the performance
by adding a better MDS server so I redid the entire build.

Now it takes 4 times as long to write the same data as it did before.
The only thing that changed was the MDS server. (I even tried moving
the MDS back on the old slower node and the performance was the same.)

The first install was on CentOS 7. I tried going down to CentOS 6.6
and it's the same results.
I use the same scripts to install the OSDs (which I created because I
can never get ceph-deploy to behave correctly. Although, I did use
ceph-deploy to create the MDS and MON and initial cluster creation.)

I use btrfs on the OSDS as I can get 734 MB/s write and 1100 MB/s read
with rados bench -p cephfs_data 500 write --no-cleanup  rados bench
-p cephfs_data 500 seq (xfs was 734 MB/s write but only 200 MB/s read)

Could anybody think of a reason as to why I am now getting a huge regression.

Hardware Setup:
[OSDs]
64 GB 2133 MHz
Dual Proc E5-2630 v3 @ 2.40GHz (16 Cores)
40Gb Mellanox NIC

[MDS/MON new]
128 GB 2133 MHz
Dual Proc E5-2650 v3 @ 2.30GHz (20 Cores)
40Gb Mellanox NIC

[MDS/MON old]
32 GB 800 MHz
Dual Proc E5472  @ 3.00GHz (8 Cores)
10Gb Intel NIC
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Slow writes with 1MB files

2015-03-27 Thread Gregory Farnum
So this is exactly the same test you ran previously, but now it's on
faster hardware and the test is slower?

Do you have more data in the test cluster? One obvious possibility is
that previously you were working entirely in the MDS' cache, but now
you've got more dentries and so it's kicking data out to RADOS and
then reading it back in.

If you've got the memory (you appear to) you can pump up the mds
cache size config option quite dramatically from it's default 10.

Other things to check are that you've got an appropriately-sized
metadata pool, that you've not got clients competing against each
other inappropriately, etc.
-Greg

On Fri, Mar 27, 2015 at 9:47 AM, Barclay Jameson
almightybe...@gmail.com wrote:
 Opps I should have said that I am not just writing the data but copying it :

 time cp Small1/* Small2/*

 Thanks,

 BJ

 On Fri, Mar 27, 2015 at 11:40 AM, Barclay Jameson
 almightybe...@gmail.com wrote:
 I did a Ceph cluster install 2 weeks ago where I was getting great
 performance (~= PanFS) where I could write 100,000 1MB files in 61
 Mins (Took PanFS 59 Mins). I thought I could increase the performance
 by adding a better MDS server so I redid the entire build.

 Now it takes 4 times as long to write the same data as it did before.
 The only thing that changed was the MDS server. (I even tried moving
 the MDS back on the old slower node and the performance was the same.)

 The first install was on CentOS 7. I tried going down to CentOS 6.6
 and it's the same results.
 I use the same scripts to install the OSDs (which I created because I
 can never get ceph-deploy to behave correctly. Although, I did use
 ceph-deploy to create the MDS and MON and initial cluster creation.)

 I use btrfs on the OSDS as I can get 734 MB/s write and 1100 MB/s read
 with rados bench -p cephfs_data 500 write --no-cleanup  rados bench
 -p cephfs_data 500 seq (xfs was 734 MB/s write but only 200 MB/s read)

 Could anybody think of a reason as to why I am now getting a huge regression.

 Hardware Setup:
 [OSDs]
 64 GB 2133 MHz
 Dual Proc E5-2630 v3 @ 2.40GHz (16 Cores)
 40Gb Mellanox NIC

 [MDS/MON new]
 128 GB 2133 MHz
 Dual Proc E5-2650 v3 @ 2.30GHz (20 Cores)
 40Gb Mellanox NIC

 [MDS/MON old]
 32 GB 800 MHz
 Dual Proc E5472  @ 3.00GHz (8 Cores)
 10Gb Intel NIC
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Snapshots and fstrim with cache tiers ?

2015-03-27 Thread Gregory Farnum
On Wed, Mar 25, 2015 at 3:14 AM, Frédéric Nass
frederic.n...@univ-lorraine.fr wrote:
 Hello,


 I have a few questions regarding snapshots and fstrim with cache tiers.


 In the cache tier and erasure coding FAQ related to ICE 1.2 (based on
 Firefly), Inktank says Snapshots are not supported in conjunction with
 cache tiers.

 What are the risks of using snapshots with cache tiers ? Would this better
 not use it recommandation still be true with Giant or Hammer ?


 Regarding the fstrim command, it doesn't seem to work with cache tiers. The
 freed up blocks don't get back in the ceph cluster.
 Can someone confirm this ? Is there something we can do to get those freed
 up blocks back in the cluster ?

It does work, but there are two effects you're missing here:
1) The object can be deleted in the cache tier, but it won't get
deleted from the backing pool until it gets flushed out of the cache
pool. Depending on your workload this can take a while.
2) On erasure-coded pool, the OSD makes sure it can roll back a
certain number of operations per PG. In the case of deletions, this
means keeping the object data around for a while. This can also take a
while if you're not doing many operations. This has been discussed on
the list before; I think you'll want to look for a thread about
rollback and pg log size.
-Greg



 Also, can we run an fstrim task from the cluster side ? That is, without
 having to map and mount each rbd image or rely on the client to operate this
 task ?


 Best regards,


 --

 Frédéric Nass

 Sous-direction Infrastructures
 Direction du Numérique
 Université de Lorraine

 email : frederic.n...@univ-lorraine.fr
 Tél : +33 3 83 68 53 83

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor 0.87.1 crashes

2015-03-27 Thread Samuel Just

You'll want to at least include the backtrace.
-Sam

On 03/27/2015 10:55 AM, samuel wrote:

Hi all,

In a fully functional ceph installation today we suffer a problem with 
ceph monitors, that started crashing with following error:

include/interval_set.h: 340: FAILED assert(0)

Is there any related bug?

Thanks a lot in advance,
Samuel



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor 0.87.1 crashes

2015-03-27 Thread samuel
Here it it goes (in case further information is needed, just ask and I
would gladly offer it):

-5 2015-03-27 19:06:01.168361 7f94b4184700  5 mon.mon01@0(leader).osd
e37404 send_incremental [37403..37404] to client.1419434 10.10.200.3:0/280
8592243
-4 2015-03-27 19:06:01.168427 7f94b4184700  1 -- 10.10.200.20:6789/0
-- client.1419434 10.10.200.3:0/2808592243 -- osd_map(37403..37404 src has
 36883..37404) v3 -- ?+0 0x3c59d40
-3 2015-03-27 19:06:01.168451 7f94b4184700  1 -- 10.10.200.20:6789/0
-- 10.10.200.3:0/2808592243 -- mon_subscribe_ack(300s) v1 -- ?+0 0x3fa4d00
 con 0x3c7e460
-2 2015-03-27 19:06:01.168465 7f94b4184700  1 -- 10.10.200.20:6789/0
== client.1419434 10.10.200.3:0/2808592243 4  pool_op(delete unmanaged
 snap pool 6 auid 0 tid 2617 name  v0) v4  65+0+0 (423335705 0 0)
0x3c33600 con 0x3c7e460
-1 2015-03-27 19:06:01.168475 7f94b4184700  5
mon.mon01@0(leader).paxos(paxos
active c 16805455..16806016) is_readable = 1 - now=2015-03-27 19:0
6:01.168476 lease_expire=0.00 has v0 lc 16806016
 0 2015-03-27 19:06:01.170738 7f94b4184700 -1
./include/interval_set.h: In function 'void interval_setT::insert(T, T)
[with T = snapid_t]' thread 7f94b4184700 time 2015-03-27 19:06:01.168499
./include/interval_set.h: 340: FAILED assert(0)

 ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x85) [0x7d7825]
 2: /usr/bin/ceph-mon() [0x88def5]
 3: (pg_pool_t::remove_unmanaged_snap(snapid_t)+0x43) [0x886e53]
 4: (OSDMonitor::prepare_pool_op(MPoolOp*)+0xac5) [0x628d65]
 5: (OSDMonitor::prepare_update(PaxosServiceMessage*)+0x23b) [0x63b4eb]
 6: (PaxosService::dispatch(PaxosServiceMessage*)+0xd0f) [0x5f923f]
 7: (Monitor::dispatch(MonSession*, Message*, bool)+0x2a3) [0x5c0cf3]
 8: (Monitor::_ms_dispatch(Message*)+0x1cd) [0x5c178d]
 9: (Monitor::ms_dispatch(Message*)+0x23) [0x5e2443]
 10: (DispatchQueue::entry()+0x62a) [0x9194da]
 11: (DispatchQueue::DispatchThread::entry()+0xd) [0x7bc0cd]
 12: (()+0x7df3) [0x7f94bcfb0df3]
 13: (clone()+0x6d) [0x7f94bba931ad]


On 27 March 2015 at 19:04, Samuel Just sj...@redhat.com wrote:

  You'll want to at least include the backtrace.
 -Sam


 On 03/27/2015 10:55 AM, samuel wrote:

  Hi all,

  In a fully functional ceph installation today we suffer a problem with
 ceph monitors, that started crashing with following error:
 include/interval_set.h: 340: FAILED assert(0)

  Is there any related bug?

  Thanks a lot in advance,
 Samuel



 ___
 ceph-users mailing 
 listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor 0.87.1 crashes

2015-03-27 Thread samuel
apologies for the noise. Host  10.10.200.3 had some issues that made
monitors to crash.

Thanks a lot for your help,
Samuel

On 27 March 2015 at 19:09, samuel sam...@gmail.com wrote:

 Here it it goes (in case further information is needed, just ask and I
 would gladly offer it):

 -5 2015-03-27 19:06:01.168361 7f94b4184700  5 mon.mon01@0(leader).osd
 e37404 send_incremental [37403..37404] to client.1419434 10.10.200.3:0/280
 8592243
 -4 2015-03-27 19:06:01.168427 7f94b4184700  1 -- 10.10.200.20:6789/0
 -- client.1419434 10.10.200.3:0/2808592243 -- osd_map(37403..37404 src
 has
  36883..37404) v3 -- ?+0 0x3c59d40
 -3 2015-03-27 19:06:01.168451 7f94b4184700  1 -- 10.10.200.20:6789/0
 -- 10.10.200.3:0/2808592243 -- mon_subscribe_ack(300s) v1 -- ?+0
 0x3fa4d00
  con 0x3c7e460
 -2 2015-03-27 19:06:01.168465 7f94b4184700  1 -- 10.10.200.20:6789/0
 == client.1419434 10.10.200.3:0/2808592243 4  pool_op(delete
 unmanaged
  snap pool 6 auid 0 tid 2617 name  v0) v4  65+0+0 (423335705 0 0)
 0x3c33600 con 0x3c7e460
 -1 2015-03-27 19:06:01.168475 7f94b4184700  5 
 mon.mon01@0(leader).paxos(paxos
 active c 16805455..16806016) is_readable = 1 - now=2015-03-27 19:0
 6:01.168476 lease_expire=0.00 has v0 lc 16806016
  0 2015-03-27 19:06:01.170738 7f94b4184700 -1
 ./include/interval_set.h: In function 'void interval_setT::insert(T, T)
 [with T = snapid_t]' thread 7f94b4184700 time 2015-03-27 19:06:01.168499
 ./include/interval_set.h: 340: FAILED assert(0)

  ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e)
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
 const*)+0x85) [0x7d7825]
  2: /usr/bin/ceph-mon() [0x88def5]
  3: (pg_pool_t::remove_unmanaged_snap(snapid_t)+0x43) [0x886e53]
  4: (OSDMonitor::prepare_pool_op(MPoolOp*)+0xac5) [0x628d65]
  5: (OSDMonitor::prepare_update(PaxosServiceMessage*)+0x23b) [0x63b4eb]
  6: (PaxosService::dispatch(PaxosServiceMessage*)+0xd0f) [0x5f923f]
  7: (Monitor::dispatch(MonSession*, Message*, bool)+0x2a3) [0x5c0cf3]
  8: (Monitor::_ms_dispatch(Message*)+0x1cd) [0x5c178d]
  9: (Monitor::ms_dispatch(Message*)+0x23) [0x5e2443]
  10: (DispatchQueue::entry()+0x62a) [0x9194da]
  11: (DispatchQueue::DispatchThread::entry()+0xd) [0x7bc0cd]
  12: (()+0x7df3) [0x7f94bcfb0df3]
  13: (clone()+0x6d) [0x7f94bba931ad]


 On 27 March 2015 at 19:04, Samuel Just sj...@redhat.com wrote:

  You'll want to at least include the backtrace.
 -Sam


 On 03/27/2015 10:55 AM, samuel wrote:

  Hi all,

  In a fully functional ceph installation today we suffer a problem with
 ceph monitors, that started crashing with following error:
 include/interval_set.h: 340: FAILED assert(0)

  Is there any related bug?

  Thanks a lot in advance,
 Samuel



 ___
 ceph-users mailing 
 listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] adding a new pool causes old pool warning pool x has too few pgs

2015-03-27 Thread Chad William Seys
Weird:  After a few hours, health check comes back OK without changing the 
number of PGS for any pools !

 Hi All,
 
   To a Healthy cluster I recently added two pools to ceph, 1 replicated and
   1
 
 ecpool.  Then I made the replicated pool into a cache for the ecpool.
 
   Afterwards ceph health check started complaining about a preexisting pool
 
 having too few pgs.  Previous to adding the new pools there was no warning.
 
   Why does adding new pools cause an old pool to have too few pgs?
 
 Thanks!
 Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Slow writes with 1MB files

2015-03-27 Thread Barclay Jameson
Yes it's the exact same hardware except for the MDS server (although I
tried using the MDS on the old node).
I have not tried moving the MON back to the old node.

My default cache size is mds cache size = 1000
The OSDs (3 of them) have 16 Disks with 4 SSD Journal Disks.
I created 2048 for data and metadata:
ceph osd pool create cephfs_data 2048 2048
ceph osd pool create cephfs_metadata 2048 2048


To your point on clients competing against each other... how would I check that?

Thanks for the input!


On Fri, Mar 27, 2015 at 3:04 PM, Gregory Farnum g...@gregs42.com wrote:
 So this is exactly the same test you ran previously, but now it's on
 faster hardware and the test is slower?

 Do you have more data in the test cluster? One obvious possibility is
 that previously you were working entirely in the MDS' cache, but now
 you've got more dentries and so it's kicking data out to RADOS and
 then reading it back in.

 If you've got the memory (you appear to) you can pump up the mds
 cache size config option quite dramatically from it's default 10.

 Other things to check are that you've got an appropriately-sized
 metadata pool, that you've not got clients competing against each
 other inappropriately, etc.
 -Greg

 On Fri, Mar 27, 2015 at 9:47 AM, Barclay Jameson
 almightybe...@gmail.com wrote:
 Opps I should have said that I am not just writing the data but copying it :

 time cp Small1/* Small2/*

 Thanks,

 BJ

 On Fri, Mar 27, 2015 at 11:40 AM, Barclay Jameson
 almightybe...@gmail.com wrote:
 I did a Ceph cluster install 2 weeks ago where I was getting great
 performance (~= PanFS) where I could write 100,000 1MB files in 61
 Mins (Took PanFS 59 Mins). I thought I could increase the performance
 by adding a better MDS server so I redid the entire build.

 Now it takes 4 times as long to write the same data as it did before.
 The only thing that changed was the MDS server. (I even tried moving
 the MDS back on the old slower node and the performance was the same.)

 The first install was on CentOS 7. I tried going down to CentOS 6.6
 and it's the same results.
 I use the same scripts to install the OSDs (which I created because I
 can never get ceph-deploy to behave correctly. Although, I did use
 ceph-deploy to create the MDS and MON and initial cluster creation.)

 I use btrfs on the OSDS as I can get 734 MB/s write and 1100 MB/s read
 with rados bench -p cephfs_data 500 write --no-cleanup  rados bench
 -p cephfs_data 500 seq (xfs was 734 MB/s write but only 200 MB/s read)

 Could anybody think of a reason as to why I am now getting a huge 
 regression.

 Hardware Setup:
 [OSDs]
 64 GB 2133 MHz
 Dual Proc E5-2630 v3 @ 2.40GHz (16 Cores)
 40Gb Mellanox NIC

 [MDS/MON new]
 128 GB 2133 MHz
 Dual Proc E5-2650 v3 @ 2.30GHz (20 Cores)
 40Gb Mellanox NIC

 [MDS/MON old]
 32 GB 800 MHz
 Dual Proc E5472  @ 3.00GHz (8 Cores)
 10Gb Intel NIC
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Slow writes with 1MB files

2015-03-27 Thread Gregory Farnum
On Fri, Mar 27, 2015 at 2:46 PM, Barclay Jameson
almightybe...@gmail.com wrote:
 Yes it's the exact same hardware except for the MDS server (although I
 tried using the MDS on the old node).
 I have not tried moving the MON back to the old node.

 My default cache size is mds cache size = 1000
 The OSDs (3 of them) have 16 Disks with 4 SSD Journal Disks.
 I created 2048 for data and metadata:
 ceph osd pool create cephfs_data 2048 2048
 ceph osd pool create cephfs_metadata 2048 2048


 To your point on clients competing against each other... how would I check 
 that?

Do you have multiple clients mounted? Are they both accessing files in
the directory(ies) you're testing? Were they accessing the same
pattern of files for the old cluster?

If you happen to be running a hammer rc or something pretty new you
can use the MDS admin socket to explore a bit what client sessions
there are and what they have permissions on and check; otherwise
you'll have to figure it out from the client side.
-Greg


 Thanks for the input!


 On Fri, Mar 27, 2015 at 3:04 PM, Gregory Farnum g...@gregs42.com wrote:
 So this is exactly the same test you ran previously, but now it's on
 faster hardware and the test is slower?

 Do you have more data in the test cluster? One obvious possibility is
 that previously you were working entirely in the MDS' cache, but now
 you've got more dentries and so it's kicking data out to RADOS and
 then reading it back in.

 If you've got the memory (you appear to) you can pump up the mds
 cache size config option quite dramatically from it's default 10.

 Other things to check are that you've got an appropriately-sized
 metadata pool, that you've not got clients competing against each
 other inappropriately, etc.
 -Greg

 On Fri, Mar 27, 2015 at 9:47 AM, Barclay Jameson
 almightybe...@gmail.com wrote:
 Opps I should have said that I am not just writing the data but copying it :

 time cp Small1/* Small2/*

 Thanks,

 BJ

 On Fri, Mar 27, 2015 at 11:40 AM, Barclay Jameson
 almightybe...@gmail.com wrote:
 I did a Ceph cluster install 2 weeks ago where I was getting great
 performance (~= PanFS) where I could write 100,000 1MB files in 61
 Mins (Took PanFS 59 Mins). I thought I could increase the performance
 by adding a better MDS server so I redid the entire build.

 Now it takes 4 times as long to write the same data as it did before.
 The only thing that changed was the MDS server. (I even tried moving
 the MDS back on the old slower node and the performance was the same.)

 The first install was on CentOS 7. I tried going down to CentOS 6.6
 and it's the same results.
 I use the same scripts to install the OSDs (which I created because I
 can never get ceph-deploy to behave correctly. Although, I did use
 ceph-deploy to create the MDS and MON and initial cluster creation.)

 I use btrfs on the OSDS as I can get 734 MB/s write and 1100 MB/s read
 with rados bench -p cephfs_data 500 write --no-cleanup  rados bench
 -p cephfs_data 500 seq (xfs was 734 MB/s write but only 200 MB/s read)

 Could anybody think of a reason as to why I am now getting a huge 
 regression.

 Hardware Setup:
 [OSDs]
 64 GB 2133 MHz
 Dual Proc E5-2630 v3 @ 2.40GHz (16 Cores)
 40Gb Mellanox NIC

 [MDS/MON new]
 128 GB 2133 MHz
 Dual Proc E5-2650 v3 @ 2.30GHz (20 Cores)
 40Gb Mellanox NIC

 [MDS/MON old]
 32 GB 800 MHz
 Dual Proc E5472  @ 3.00GHz (8 Cores)
 10Gb Intel NIC
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 0.93 fresh cluster won't create PGs

2015-03-27 Thread Sage Weil
On Fri, 27 Mar 2015, Robert LeBlanc wrote:
 I've built Ceph clusters a few times now and I'm completely baffled
 about what we are seeing. We had a majority of the nodes on a new
 cluster go down yesterday and we got PGs stuck peering. We checked
 logs, firewalls, file descriptors, etc and nothing is pointing to what
 the problem is. We thought we could work around the problem by
 deleting all the pools and recreating them, but still most of the PGs
 were in a creating+peering state. Rebooting OSDs, reformatting them,
 adjusting the CRUSH, etc all proved fruitless. I took min_size and
 size to 1, tried scrubbing, deep-scrubbing the PGs and OSDs. Nothing
 seems to get the cluster to progress.
 
 As a last ditch effort, we wiped the whole cluster, regenerated UUID,
 keys, etc and pushed it all through puppet again. After creating the
 OSDs there are PGs stuck. Here is some info:
 
 [ulhglive-root@mon1 ~]# ceph status
 cluster fa158fa8-3e5d-47b1-a7bc-98a41f510ac0
  health HEALTH_WARN
 1214 pgs peering
 1216 pgs stuck inactive
 1216 pgs stuck unclean
  monmap e2: 3 mons at
 {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.29:6789/0}
 election epoch 6, quorum 0,1,2 mon1,mon2,mon3
  osdmap e161: 130 osds: 130 up, 130 in
   pgmap v468: 2048 pgs, 2 pools, 0 bytes data, 0 objects
 5514 MB used, 472 TB / 472 TB avail
  965 peering
  832 active+clean
  249 creating+peering
2 activating

Usually when we've seen something like this is has been something annoying 
with the environment, like a broken network that causes the tcp streams to 
freeze once they start sending significant traffic (e.g., affecting the 
connections that transpart data but not the ones that handle heartbeats).

As you're rebuilding, perhaps the issues start once you hit a particular 
rack or host?

 [ulhglive-root@mon1 ~]# ceph health detail | head -n 15
 HEALTH_WARN 1214 pgs peering; 1216 pgs stuck inactive; 1216 pgs stuck unclean
 pg 2.17f is stuck inactive since forever, current state
 creating+peering, last acting [39,42,77]
 pg 2.17e is stuck inactive since forever, current state
 creating+peering, last acting [125,3,110]
 pg 2.179 is stuck inactive since forever, current state peering, last acting 
 [0]
 pg 2.178 is stuck inactive since forever, current state
 creating+peering, last acting [99,120,54]
 pg 2.17b is stuck inactive since forever, current state peering, last acting 
 [0]
 pg 2.17a is stuck inactive since forever, current state
 creating+peering, last acting [91,96,122]
 pg 2.175 is stuck inactive since forever, current state
 creating+peering, last acting [55,127,2]
 pg 2.174 is stuck inactive since forever, current state peering, last acting 
 [0]
 pg 2.176 is stuck inactive since forever, current state
 creating+peering, last acting [13,70,8]
 pg 2.172 is stuck inactive since forever, current state peering, last acting 
 [0]
 pg 2.16c is stuck inactive for 1344.369455, current state peering,
 last acting [99,104,85]
 pg 2.16e is stuck inactive since forever, current state peering, last acting 
 [0]
 pg 2.169 is stuck inactive since forever, current state
 creating+peering, last acting [125,24,65]
 pg 2.16a is stuck inactive since forever, current state peering, last acting 
 [0]
 Traceback (most recent call last):
   File /bin/ceph, line 896, in module
 retval = main()
   File /bin/ceph, line 883, in main
 sys.stdout.write(prefix + outbuf + suffix)
 IOError: [Errno 32] Broken pipe
 [ulhglive-root@mon1 ~]# ceph pg dump_stuck | head -n 15
 ok
 pg_stat state   up  up_primary  acting  acting_primary
 2.17f   creating+peering[39,42,77]  39  [39,42,77]  39
 2.17e   creating+peering[125,3,110] 125 [125,3,110] 125
 2.179   peering [0] 0   [0] 0
 2.178   creating+peering[99,120,54] 99  [99,120,54] 99
 2.17b   peering [0] 0   [0] 0
 2.17a   creating+peering[91,96,122] 91  [91,96,122] 91
 2.175   creating+peering[55,127,2]  55  [55,127,2]  55
 2.174   peering [0] 0   [0] 0
 2.176   creating+peering[13,70,8]   13  [13,70,8]   13
 2.172   peering [0] 0   [0] 0
 2.16c   peering [99,104,85] 99  [99,104,85] 99
 2.16e   peering [0] 0   [0] 0
 2.169   creating+peering[125,24,65] 125 [125,24,65] 125
 2.16a   peering [0] 0   [0] 0
 
 Focusing on 2.17f on OSD 39, I set debugging to 20/20 and am attaching
 the logs. I've looked through the logs with 20/20 before we toasted
 the cluster and I couldn't find anything standing out. I have another
 cluster that is also exhibiting this problem which I'd prefer not to
 lose the data on. If anything stands out, please let me know. We are
 going to wipe this cluster again and take more manual steps.
 
 

[ceph-users] 0.93 fresh cluster won't create PGs

2015-03-27 Thread Robert LeBlanc
I've built Ceph clusters a few times now and I'm completely baffled
about what we are seeing. We had a majority of the nodes on a new
cluster go down yesterday and we got PGs stuck peering. We checked
logs, firewalls, file descriptors, etc and nothing is pointing to what
the problem is. We thought we could work around the problem by
deleting all the pools and recreating them, but still most of the PGs
were in a creating+peering state. Rebooting OSDs, reformatting them,
adjusting the CRUSH, etc all proved fruitless. I took min_size and
size to 1, tried scrubbing, deep-scrubbing the PGs and OSDs. Nothing
seems to get the cluster to progress.

As a last ditch effort, we wiped the whole cluster, regenerated UUID,
keys, etc and pushed it all through puppet again. After creating the
OSDs there are PGs stuck. Here is some info:

[ulhglive-root@mon1 ~]# ceph status
cluster fa158fa8-3e5d-47b1-a7bc-98a41f510ac0
 health HEALTH_WARN
1214 pgs peering
1216 pgs stuck inactive
1216 pgs stuck unclean
 monmap e2: 3 mons at
{mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.29:6789/0}
election epoch 6, quorum 0,1,2 mon1,mon2,mon3
 osdmap e161: 130 osds: 130 up, 130 in
  pgmap v468: 2048 pgs, 2 pools, 0 bytes data, 0 objects
5514 MB used, 472 TB / 472 TB avail
 965 peering
 832 active+clean
 249 creating+peering
   2 activating
[ulhglive-root@mon1 ~]# ceph health detail | head -n 15
HEALTH_WARN 1214 pgs peering; 1216 pgs stuck inactive; 1216 pgs stuck unclean
pg 2.17f is stuck inactive since forever, current state
creating+peering, last acting [39,42,77]
pg 2.17e is stuck inactive since forever, current state
creating+peering, last acting [125,3,110]
pg 2.179 is stuck inactive since forever, current state peering, last acting [0]
pg 2.178 is stuck inactive since forever, current state
creating+peering, last acting [99,120,54]
pg 2.17b is stuck inactive since forever, current state peering, last acting [0]
pg 2.17a is stuck inactive since forever, current state
creating+peering, last acting [91,96,122]
pg 2.175 is stuck inactive since forever, current state
creating+peering, last acting [55,127,2]
pg 2.174 is stuck inactive since forever, current state peering, last acting [0]
pg 2.176 is stuck inactive since forever, current state
creating+peering, last acting [13,70,8]
pg 2.172 is stuck inactive since forever, current state peering, last acting [0]
pg 2.16c is stuck inactive for 1344.369455, current state peering,
last acting [99,104,85]
pg 2.16e is stuck inactive since forever, current state peering, last acting [0]
pg 2.169 is stuck inactive since forever, current state
creating+peering, last acting [125,24,65]
pg 2.16a is stuck inactive since forever, current state peering, last acting [0]
Traceback (most recent call last):
  File /bin/ceph, line 896, in module
retval = main()
  File /bin/ceph, line 883, in main
sys.stdout.write(prefix + outbuf + suffix)
IOError: [Errno 32] Broken pipe
[ulhglive-root@mon1 ~]# ceph pg dump_stuck | head -n 15
ok
pg_stat state   up  up_primary  acting  acting_primary
2.17f   creating+peering[39,42,77]  39  [39,42,77]  39
2.17e   creating+peering[125,3,110] 125 [125,3,110] 125
2.179   peering [0] 0   [0] 0
2.178   creating+peering[99,120,54] 99  [99,120,54] 99
2.17b   peering [0] 0   [0] 0
2.17a   creating+peering[91,96,122] 91  [91,96,122] 91
2.175   creating+peering[55,127,2]  55  [55,127,2]  55
2.174   peering [0] 0   [0] 0
2.176   creating+peering[13,70,8]   13  [13,70,8]   13
2.172   peering [0] 0   [0] 0
2.16c   peering [99,104,85] 99  [99,104,85] 99
2.16e   peering [0] 0   [0] 0
2.169   creating+peering[125,24,65] 125 [125,24,65] 125
2.16a   peering [0] 0   [0] 0

Focusing on 2.17f on OSD 39, I set debugging to 20/20 and am attaching
the logs. I've looked through the logs with 20/20 before we toasted
the cluster and I couldn't find anything standing out. I have another
cluster that is also exhibiting this problem which I'd prefer not to
lose the data on. If anything stands out, please let me know. We are
going to wipe this cluster again and take more manual steps.

ceph-osd.39.log.xz -
https://owncloud.leblancnet.us/owncloud/public.php?service=filest=b120a67cc6111ffcba54d2e4cc8a62b5
map.xz - 
https://owncloud.leblancnet.us/owncloud/public.php?service=filest=df1eecf7d307225b7d43b5c9474561d0


After redoing the cluster again, we started slow. We added one OSD,
dropped the pools to min_size=1 and size=1, and the cluster became
healthy. We added a second OSD and changed the CRUSH rule to OSD and
it became healthy again. We change size=3 and min_size=2. We had
puppet add 10 OSDs on one host, and 

Re: [ceph-users] ceph -s slow return result

2015-03-27 Thread Chu Duc Minh
@Kobi Laredo: thank you! It's exactly my problem.
# du -sh /var/lib/ceph/mon/
*2.6G *   /var/lib/ceph/mon/
# ceph tell mon.a compact
compacted leveldb in 10.197506
# du -sh /var/lib/ceph/mon/
*461M*/var/lib/ceph/mon/
Now my ceph -s return result immediately.

Maybe monitors' LevelDB store grow so big because i pushed 13 millions file
into a bucket (over radosgw).
When have extreme large number of files in a bucket, the state of ceph
cluster could become unstable? (I'm running Giant)

Regards,

On Sat, Mar 28, 2015 at 12:57 AM, Kobi Laredo kobi.lar...@dreamhost.com
wrote:

 What's the current health of the cluster?
 It may help to compact the monitors' LevelDB store if they have grown in
 size
 http://www.sebastien-han.fr/blog/2014/10/27/ceph-mon-store-taking-up-a-lot-of-space/
 Depends on the size of the mon's store size it may take some time to
 compact, make sure to do only one at a time.

 *Kobi Laredo*
 *Cloud Systems Engineer* | (*408) 409-KOBI*

 On Fri, Mar 27, 2015 at 10:31 AM, Chu Duc Minh chu.ducm...@gmail.com
 wrote:

 All my monitors running.
 But i deleting pool .rgw.buckets, now having 13 million objects (just
 test data).
 The reason that i must delete this pool is my cluster become unstable,
 and sometimes an OSD down, PG peering, incomplete,...
 Therefore i must delete this pool to re-stablize my cluster.  (radosgw is
 too slow for delete objects when one of my bucket reachs few million
 objects).

 Regards,


 On Sat, Mar 28, 2015 at 12:23 AM, Gregory Farnum g...@gregs42.com
 wrote:

 Are all your monitors running? Usually a temporary hang means that the
 Ceph client tries to reach a monitor that isn't up, then times out and
 contacts a different one.

 I have also seen it just be slow if the monitors are processing so many
 updates that they're behind, but that's usually on a very unhappy cluster.
 -Greg
 On Fri, Mar 27, 2015 at 8:50 AM Chu Duc Minh chu.ducm...@gmail.com
 wrote:

 On my CEPH cluster, ceph -s return result quite slow.
 Sometimes it return result immediately, sometimes i hang few seconds
 before return result.

 Do you think this problem (ceph -s slow return) only relate to
 ceph-mon(s) process? or maybe it relate to ceph-osd(s) too?
 (i deleting a big bucket, .rgw.buckets, and ceph-osd(s) disk util quite
 high)

 Regards,
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 0.93 fresh cluster won't create PGs

2015-03-27 Thread Robert LeBlanc
Thanks, we'll give the gitbuilder packages a shot and report back.

Robert LeBlanc

Sent from a mobile device please excuse any typos.
On Mar 27, 2015 10:03 PM, Sage Weil s...@newdream.net wrote:

 On Fri, 27 Mar 2015, Robert LeBlanc wrote:
  I've built Ceph clusters a few times now and I'm completely baffled
  about what we are seeing. We had a majority of the nodes on a new
  cluster go down yesterday and we got PGs stuck peering. We checked
  logs, firewalls, file descriptors, etc and nothing is pointing to what
  the problem is. We thought we could work around the problem by
  deleting all the pools and recreating them, but still most of the PGs
  were in a creating+peering state. Rebooting OSDs, reformatting them,
  adjusting the CRUSH, etc all proved fruitless. I took min_size and
  size to 1, tried scrubbing, deep-scrubbing the PGs and OSDs. Nothing
  seems to get the cluster to progress.
 
  As a last ditch effort, we wiped the whole cluster, regenerated UUID,
  keys, etc and pushed it all through puppet again. After creating the
  OSDs there are PGs stuck. Here is some info:
 
  [ulhglive-root@mon1 ~]# ceph status
  cluster fa158fa8-3e5d-47b1-a7bc-98a41f510ac0
   health HEALTH_WARN
  1214 pgs peering
  1216 pgs stuck inactive
  1216 pgs stuck unclean
   monmap e2: 3 mons at
  {mon1=
 10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.29:6789/0}
  election epoch 6, quorum 0,1,2 mon1,mon2,mon3
   osdmap e161: 130 osds: 130 up, 130 in
pgmap v468: 2048 pgs, 2 pools, 0 bytes data, 0 objects
  5514 MB used, 472 TB / 472 TB avail
   965 peering
   832 active+clean
   249 creating+peering
 2 activating

 Usually when we've seen something like this is has been something annoying
 with the environment, like a broken network that causes the tcp streams to
 freeze once they start sending significant traffic (e.g., affecting the
 connections that transpart data but not the ones that handle heartbeats).

 As you're rebuilding, perhaps the issues start once you hit a particular
 rack or host?

  [ulhglive-root@mon1 ~]# ceph health detail | head -n 15
  HEALTH_WARN 1214 pgs peering; 1216 pgs stuck inactive; 1216 pgs stuck
 unclean
  pg 2.17f is stuck inactive since forever, current state
  creating+peering, last acting [39,42,77]
  pg 2.17e is stuck inactive since forever, current state
  creating+peering, last acting [125,3,110]
  pg 2.179 is stuck inactive since forever, current state peering, last
 acting [0]
  pg 2.178 is stuck inactive since forever, current state
  creating+peering, last acting [99,120,54]
  pg 2.17b is stuck inactive since forever, current state peering, last
 acting [0]
  pg 2.17a is stuck inactive since forever, current state
  creating+peering, last acting [91,96,122]
  pg 2.175 is stuck inactive since forever, current state
  creating+peering, last acting [55,127,2]
  pg 2.174 is stuck inactive since forever, current state peering, last
 acting [0]
  pg 2.176 is stuck inactive since forever, current state
  creating+peering, last acting [13,70,8]
  pg 2.172 is stuck inactive since forever, current state peering, last
 acting [0]
  pg 2.16c is stuck inactive for 1344.369455, current state peering,
  last acting [99,104,85]
  pg 2.16e is stuck inactive since forever, current state peering, last
 acting [0]
  pg 2.169 is stuck inactive since forever, current state
  creating+peering, last acting [125,24,65]
  pg 2.16a is stuck inactive since forever, current state peering, last
 acting [0]
  Traceback (most recent call last):
File /bin/ceph, line 896, in module
  retval = main()
File /bin/ceph, line 883, in main
  sys.stdout.write(prefix + outbuf + suffix)
  IOError: [Errno 32] Broken pipe
  [ulhglive-root@mon1 ~]# ceph pg dump_stuck | head -n 15
  ok
  pg_stat state   up  up_primary  acting  acting_primary
  2.17f   creating+peering[39,42,77]  39  [39,42,77]
 39
  2.17e   creating+peering[125,3,110] 125 [125,3,110]
  125
  2.179   peering [0] 0   [0] 0
  2.178   creating+peering[99,120,54] 99  [99,120,54]
  99
  2.17b   peering [0] 0   [0] 0
  2.17a   creating+peering[91,96,122] 91  [91,96,122]
  91
  2.175   creating+peering[55,127,2]  55  [55,127,2]
 55
  2.174   peering [0] 0   [0] 0
  2.176   creating+peering[13,70,8]   13  [13,70,8]
  13
  2.172   peering [0] 0   [0] 0
  2.16c   peering [99,104,85] 99  [99,104,85] 99
  2.16e   peering [0] 0   [0] 0
  2.169   creating+peering[125,24,65] 125 [125,24,65]
  125
  2.16a   peering [0] 0   [0] 0
 
  Focusing on 2.17f on OSD 39, I set debugging to 20/20 and am attaching
  the logs. I've looked through the logs with 20/20 before we toasted
  the cluster and I