[ceph-users] 1 mons down, ceph-create-keys

2014-02-15 Thread Vadim Vatlin
Hello
Could you help me please 

ceph status
cluster 4da1f6d8-ca10-4bfa-bff7-c3c1cdb3f888
 health HEALTH_WARN 229 pgs peering; 102 pgs stuck inactive; 236 pgs stuck 
unclean; 1 mons down, quorum 0,1 st1,st2
 monmap e3: 3 mons at 
{st1=109.233.57.226:6789/0,st2=91.224.140.229:6789/0,st3=176.9.250.166:6789/0}, 
election epoch 72432, quorum 0,1 st1,st2
 osdmap e714: 3 osds: 3 up, 3 in
  pgmap v1824: 292 pgs, 4 pools, 135 bytes data, 2 objects
137 MB used, 284 GB / 284 GB avail
   7 active
  56 active+clean
 188 peering
  41 remapped+peering

I try to restart st3 monitor 
 service ceph -a restart mon.st3

ps aux | grep ceph
root  9642  1.7 19.8 785988 202260 ?   Ssl 12:16   0:11 
/usr/bin/ceph-osd -i 2 --pid-file /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.conf
root 21375  5.0  3.5 212996 35852 pts/0Sl   12:27   0:00 
/usr/bin/ceph-mon -i st3 --pid-file /var/run/ceph/mon.st3.pid -c 
/etc/ceph/ceph.conf
root 21393  0.5  0.5  51308  6060 pts/0S12:27   0:00 python 
/usr/sbin/ceph-create-keys -i st3

Process ceph-create-keys - stuck and have never finished.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 1 mons down, ceph-create-keys

2014-02-15 Thread Udo Lembke
Hi,
perhaps your filesystem is too full?

df -k
du -hs /var/lib/ceph/mon/ceph-st3/store.db


What output/Error-Message you get if you start the mon in the foreground?

ceph-mon -i st3 -d -c /etc/ceph/ceph.conf

Udo

On 15.02.2014 09:30, Vadim Vatlin wrote:
 Hello
 Could you help me please 

 ceph status
 cluster 4da1f6d8-ca10-4bfa-bff7-c3c1cdb3f888
  health HEALTH_WARN 229 pgs peering; 102 pgs stuck inactive; 236 pgs 
 stuck unclean; 1 mons down, quorum 0,1 st1,st2
  monmap e3: 3 mons at 
 {st1=109.233.57.226:6789/0,st2=91.224.140.229:6789/0,st3=176.9.250.166:6789/0},
  election epoch 72432, quorum 0,1 st1,st2
  osdmap e714: 3 osds: 3 up, 3 in
   pgmap v1824: 292 pgs, 4 pools, 135 bytes data, 2 objects
 137 MB used, 284 GB / 284 GB avail
7 active
   56 active+clean
  188 peering
   41 remapped+peering

 I try to restart st3 monitor 
  service ceph -a restart mon.st3

 ps aux | grep ceph
 root  9642  1.7 19.8 785988 202260 ?   Ssl 12:16   0:11 
 /usr/bin/ceph-osd -i 2 --pid-file /var/run/ceph/osd.2.pid -c 
 /etc/ceph/ceph.conf
 root 21375  5.0  3.5 212996 35852 pts/0Sl   12:27   0:00 
 /usr/bin/ceph-mon -i st3 --pid-file /var/run/ceph/mon.st3.pid -c 
 /etc/ceph/ceph.conf
 root 21393  0.5  0.5  51308  6060 pts/0S12:27   0:00 python 
 /usr/sbin/ceph-create-keys -i st3

 Process ceph-create-keys - stuck and have never finished.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Problem starting RADOS Gateway

2014-02-15 Thread Georgios Dimitrakakis

Dear all,

I am following this guide http://ceph.com/docs/master/radosgw/config/ 
to setup Object Storage on CentOS 6.5.


My problem is that when I try to start the service as indicated here: 
http://ceph.com/docs/master/radosgw/config/#restart-services-and-start-the-gateway



I get nothing

# service ceph-radosgw start
Starting radosgw instance(s)...

and if I check if the service is running obviously it is not!

# service ceph-radosgw status
/usr/bin/radosgw is not running.


If I try to start it manually without using the service command I get 
the following:


# /usr/bin/radosgw -d -c /etc/ceph/ceph.conf --debug_ms 10
2014-02-15 16:03:38.709235 7fb65ba64820  0 ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60), process radosgw, pid 24619
2014-02-15 16:03:38.709249 7fb65ba64820 -1 WARNING: libcurl doesn't 
support curl_multi_wait()
2014-02-15 16:03:38.709252 7fb65ba64820 -1 WARNING: cross zone / region 
transfer performance may be affected

2014-02-15 16:03:38.713898 7fb65ba64820 10 -- :/0 ready :/0
2014-02-15 16:03:38.714323 7fb65ba64820  1 -- :/0 messenger.start
2014-02-15 16:03:38.714434 7fb65ba64820 -1 monclient(hunting): ERROR: 
missing keyring, cannot use cephx for authentication
2014-02-15 16:03:38.714440 7fb65ba64820  0 librados: client.admin 
initialization error (2) No such file or directory
2014-02-15 16:03:38.714463 7fb65ba64820 10 -- :/1024619 shutdown 
:/1024619

2014-02-15 16:03:38.714468 7fb65ba64820  1 -- :/1024619 mark_down_all
2014-02-15 16:03:38.714477 7fb65ba64820 10 -- :/1024619 wait: waiting 
for dispatch queue
2014-02-15 16:03:38.714406 7fb64b5fe700 10 -- :/1024619 reaper_entry 
start

2014-02-15 16:03:38.714506 7fb64b5fe700 10 -- :/1024619 reaper
2014-02-15 16:03:38.714522 7fb64b5fe700 10 -- :/1024619 reaper done
2014-02-15 16:03:38.714764 7fb65ba64820 10 -- :/1024619 wait: dispatch 
queue is stopped
2014-02-15 16:03:38.714786 7fb64b5fe700 10 -- :/1024619 reaper_entry 
done
2014-02-15 16:03:38.714819 7fb65ba64820 10 -- :/1024619 wait: closing 
pipes

2014-02-15 16:03:38.714826 7fb65ba64820 10 -- :/1024619 reaper
2014-02-15 16:03:38.714828 7fb65ba64820 10 -- :/1024619 reaper done
2014-02-15 16:03:38.714830 7fb65ba64820 10 -- :/1024619 wait: waiting 
for pipes  to close

2014-02-15 16:03:38.714832 7fb65ba64820 10 -- :/1024619 wait: done.
2014-02-15 16:03:38.714833 7fb65ba64820  1 -- :/1024619 shutdown 
complete.
2014-02-15 16:03:38.714916 7fb65ba64820 -1 Couldn't init storage 
provider (RADOS)


Obviously the problem is some missing keyring but which one and how can 
I solve this problem? Furthermore, why this is happening since I am 
following the guide to the letter?? Is something missing??


Best,

G.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problem starting RADOS Gateway

2014-02-15 Thread Udo Lembke
Hi,
does ceph -s also stuck on missing keyring?

Do you have an keyring like:
cat /etc/ceph/keyring
[client.admin]
key = AQCdkHZR2NBYMBAATe/rqIwCI96LTuyS3gmMXp==

Or do you have anothe defined keyring in ceph.conf?
global-section - keyring = /etc/ceph/keyring

The key is in ceph - see
ceph auth get-key client.admin
AQCdkHZR2NBYMBAATe/rqIwCI96LTuyS3gmMXp==

or ceph auth list for all keys.
Key-genaration is doing by get-or-create key like this (but in this case
for bootstap-osd):
ceph auth get-or-create-key client.bootstrap-osd mon allow profile
bootstrap-osd

Udo

On 15.02.2014 15:35, Georgios Dimitrakakis wrote:
 Dear all,

 I am following this guide http://ceph.com/docs/master/radosgw/config/
 to setup Object Storage on CentOS 6.5.

 My problem is that when I try to start the service as indicated here:
 http://ceph.com/docs/master/radosgw/config/#restart-services-and-start-the-gateway


 I get nothing

 # service ceph-radosgw start
 Starting radosgw instance(s)...

 and if I check if the service is running obviously it is not!

 # service ceph-radosgw status
 /usr/bin/radosgw is not running.


 If I try to start it manually without using the service command I get
 the following:

 # /usr/bin/radosgw -d -c /etc/ceph/ceph.conf --debug_ms 10
 2014-02-15 16:03:38.709235 7fb65ba64820  0 ceph version 0.72.2
 (a913ded2ff138aefb8cb84d347d72164099cfd60), process radosgw, pid 24619
 2014-02-15 16:03:38.709249 7fb65ba64820 -1 WARNING: libcurl doesn't
 support curl_multi_wait()
 2014-02-15 16:03:38.709252 7fb65ba64820 -1 WARNING: cross zone /
 region transfer performance may be affected
 2014-02-15 16:03:38.713898 7fb65ba64820 10 -- :/0 ready :/0
 2014-02-15 16:03:38.714323 7fb65ba64820  1 -- :/0 messenger.start
 2014-02-15 16:03:38.714434 7fb65ba64820 -1 monclient(hunting): ERROR:
 missing keyring, cannot use cephx for authentication
 2014-02-15 16:03:38.714440 7fb65ba64820  0 librados: client.admin
 initialization error (2) No such file or directory
 2014-02-15 16:03:38.714463 7fb65ba64820 10 -- :/1024619 shutdown
 :/1024619
 2014-02-15 16:03:38.714468 7fb65ba64820  1 -- :/1024619 mark_down_all
 2014-02-15 16:03:38.714477 7fb65ba64820 10 -- :/1024619 wait: waiting
 for dispatch queue
 2014-02-15 16:03:38.714406 7fb64b5fe700 10 -- :/1024619 reaper_entry
 start
 2014-02-15 16:03:38.714506 7fb64b5fe700 10 -- :/1024619 reaper
 2014-02-15 16:03:38.714522 7fb64b5fe700 10 -- :/1024619 reaper done
 2014-02-15 16:03:38.714764 7fb65ba64820 10 -- :/1024619 wait: dispatch
 queue is stopped
 2014-02-15 16:03:38.714786 7fb64b5fe700 10 -- :/1024619 reaper_entry done
 2014-02-15 16:03:38.714819 7fb65ba64820 10 -- :/1024619 wait: closing
 pipes
 2014-02-15 16:03:38.714826 7fb65ba64820 10 -- :/1024619 reaper
 2014-02-15 16:03:38.714828 7fb65ba64820 10 -- :/1024619 reaper done
 2014-02-15 16:03:38.714830 7fb65ba64820 10 -- :/1024619 wait: waiting
 for pipes  to close
 2014-02-15 16:03:38.714832 7fb65ba64820 10 -- :/1024619 wait: done.
 2014-02-15 16:03:38.714833 7fb65ba64820  1 -- :/1024619 shutdown
 complete.
 2014-02-15 16:03:38.714916 7fb65ba64820 -1 Couldn't init storage
 provider (RADOS)

 Obviously the problem is some missing keyring but which one and how
 can I solve this problem? Furthermore, why this is happening since I
 am following the guide to the letter?? Is something missing??

 Best,

 G.

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problem starting RADOS Gateway

2014-02-15 Thread Georgios Dimitrakakis

1) ceph -s is working as expected

# ceph -s
cluster c465bdb2-e0a5-49c8-8305-efb4234ac88a
 health HEALTH_OK
 monmap e1: 1 mons at {master=192.168.0.10:6789/0}, election epoch 
1, quorum 0 master

 mdsmap e111: 1/1/1 up {0=master=up:active}
 osdmap e114: 2 osds: 2 up, 2 in
  pgmap v414: 1200 pgs, 14 pools, 10596 bytes data, 67 objects
500 GB used, 1134 GB / 1722 GB avail
1200 active+clean


2) In /etc/ceph I have the following files

# ls -l
total 20
-rw-r--r-- 1 root root  64 Feb 14 17:10 ceph.client.admin.keyring
-rw-r--r-- 1 root root 401 Feb 15 16:57 ceph.conf
-rw-r--r-- 1 root root 196 Feb 14 20:26 ceph.log
-rw-r--r-- 1 root root 120 Feb 15 11:08 keyring.radosgw.gateway
-rwxr-xr-x 1 root root  92 Dec 21 00:47 rbdmap

3) ceph.conf content is the following

# cat ceph.conf
[global]
auth_service_required = cephx
filestore_xattr_use_omap = true
auth_client_required = cephx
auth_cluster_required = cephx
mon_host = 192.168.0.10
mon_initial_members = master
fsid = c465bdb2-e0a5-49c8-8305-efb4234ac88a

[client.radosgw.gateway]
host = master
keyring = /etc/ceph/keyring.radosgw.gateway
rgw socket path = /tmp/radosgw.sock
log file = /var/log/ceph/radosgw.log


4) And all the keys that exist are the following:

# ceph auth list
installed auth entries:

mds.master
key: xx==
caps: [mds] allow
caps: [mon] allow profile mds
caps: [osd] allow rwx
osd.0
key: xx==
caps: [mon] allow profile osd
caps: [osd] allow *
osd.1
key: xx==
caps: [mon] allow profile osd
caps: [osd] allow *
client.admin
key: xx==
caps: [mds] allow
caps: [mon] allow *
caps: [osd] allow *
client.bootstrap-mds
key: xx==
caps: [mon] allow profile bootstrap-mds
client.bootstrap-osd
key: AQBWLf5SGBAyBRAAzLwi5OXsAuR5vdo8hs+2zw==
caps: [mon] allow profile bootstrap-osd
client.radosgw.gateway
key: xx==
caps: [mon] allow rw
caps: [osd] allow rwx



I still don't get what is wrong...

G.

On Sat, 15 Feb 2014 16:27:41 +0100, Udo Lembke wrote:

Hi,
does ceph -s also stuck on missing keyring?

Do you have an keyring like:
cat /etc/ceph/keyring
[client.admin]
key = AQCdkHZR2NBYMBAATe/rqIwCI96LTuyS3gmMXp==

Or do you have anothe defined keyring in ceph.conf?
global-section - keyring = /etc/ceph/keyring

The key is in ceph - see
ceph auth get-key client.admin
AQCdkHZR2NBYMBAATe/rqIwCI96LTuyS3gmMXp==

or ceph auth list for all keys.
Key-genaration is doing by get-or-create key like this (but in this 
case

for bootstap-osd):
ceph auth get-or-create-key client.bootstrap-osd mon allow profile
bootstrap-osd

Udo

On 15.02.2014 15:35, Georgios Dimitrakakis wrote:

Dear all,

I am following this guide 
http://ceph.com/docs/master/radosgw/config/

to setup Object Storage on CentOS 6.5.

My problem is that when I try to start the service as indicated 
here:


http://ceph.com/docs/master/radosgw/config/#restart-services-and-start-the-gateway


I get nothing

# service ceph-radosgw start
Starting radosgw instance(s)...

and if I check if the service is running obviously it is not!

# service ceph-radosgw status
/usr/bin/radosgw is not running.


If I try to start it manually without using the service command I 
get

the following:

# /usr/bin/radosgw -d -c /etc/ceph/ceph.conf --debug_ms 10
2014-02-15 16:03:38.709235 7fb65ba64820  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process radosgw, pid 
24619

2014-02-15 16:03:38.709249 7fb65ba64820 -1 WARNING: libcurl doesn't
support curl_multi_wait()
2014-02-15 16:03:38.709252 7fb65ba64820 -1 WARNING: cross zone /
region transfer performance may be affected
2014-02-15 16:03:38.713898 7fb65ba64820 10 -- :/0 ready :/0
2014-02-15 16:03:38.714323 7fb65ba64820  1 -- :/0 messenger.start
2014-02-15 16:03:38.714434 7fb65ba64820 -1 monclient(hunting): 
ERROR:

missing keyring, cannot use cephx for authentication
2014-02-15 16:03:38.714440 7fb65ba64820  0 librados: client.admin
initialization error (2) No such file or directory
2014-02-15 16:03:38.714463 7fb65ba64820 10 -- :/1024619 shutdown
:/1024619
2014-02-15 16:03:38.714468 7fb65ba64820  1 -- :/1024619 
mark_down_all
2014-02-15 16:03:38.714477 7fb65ba64820 10 -- :/1024619 wait: 
waiting

for dispatch queue
2014-02-15 16:03:38.714406 7fb64b5fe700 10 -- :/1024619 reaper_entry
start
2014-02-15 16:03:38.714506 7fb64b5fe700 10 -- :/1024619 reaper
2014-02-15 16:03:38.714522 7fb64b5fe700 10 -- :/1024619 reaper done
2014-02-15 16:03:38.714764 7fb65ba64820 10 -- :/1024619 wait: 
dispatch

queue is stopped
2014-02-15 16:03:38.714786 7fb64b5fe700 10 -- :/1024619 reaper_entry 
done
2014-02-15 16:03:38.714819 7fb65ba64820 10 -- :/1024619 wait: 

[ceph-users] Block Devices and OpenStack

2014-02-15 Thread Ashish Chandra
Hi Cephers,

I am trying to configure ceph rbd as backend for cinder and glance by
following the steps mentioned in:

http://ceph.com/docs/master/rbd/rbd-openstack/

Before I start all openstack services are running normally and ceph cluster
health shows HEALTH_OK

But once I am done with all steps and restart openstack services,
cinder-volume fails to start and throws an error.

2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd Traceback (most
recent call last):
2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd   File
/opt/stack/cinder/cinder/volume/drivers/rbd.py, line 262, in
check_for_setup_error
2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd with
RADOSClient(self):
2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd   File
/opt/stack/cinder/cinder/volume/drivers/rbd.py, line 234, in __init__
2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd self.cluster,
self.ioctx = driver._connect_to_rados(pool)
2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd   File
/opt/stack/cinder/cinder/volume/drivers/rbd.py, line 282, in
_connect_to_rados
2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd client.connect()
2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd   File
/usr/lib/python2.7/dist-packages/rados.py, line 185, in connect
2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd raise
make_ex(ret, error calling connect)
2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd Error: error
calling connect: error code 95
2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd
2014-02-16 00:01:42.591 ERROR cinder.volume.manager
[req-8134a4d7-53f8-4ada-b4b5-4d96d7cad4bc None None] Error encountered
during initialization of driver: RBDDriver
2014-02-16 00:01:42.592 ERROR cinder.volume.manager
[req-8134a4d7-53f8-4ada-b4b5-4d96d7cad4bc None None] Bad or unexpected
response from the storage volume backend API: error connecting to ceph
cluster
2014-02-16 00:01:42.592 TRACE cinder.volume.manager Traceback (most recent
call last):
2014-02-16 00:01:42.592 TRACE cinder.volume.manager   File
/opt/stack/cinder/cinder/volume/manager.py, line 190, in init_host
2014-02-16 00:01:42.592 TRACE cinder.volume.manager
self.driver.check_for_setup_error()
2014-02-16 00:01:42.592 TRACE cinder.volume.manager   File
/opt/stack/cinder/cinder/volume/drivers/rbd.py, line 267, in
check_for_setup_error
2014-02-16 00:01:42.592 TRACE cinder.volume.manager raise
exception.VolumeBackendAPIException(data=msg)
2014-02-16 00:01:42.592 TRACE cinder.volume.manager
VolumeBackendAPIException: Bad or unexpected response from the storage
volume backend API: error connecting to ceph cluster


Here is the content of my /etc/ceph in openstack node:

ashish@ubuntu:/etc/ceph$ ls -lrt
total 16
-rw-r--r-- 1 cinder cinder 229 Feb 15 23:45 ceph.conf
-rw-r--r-- 1 glance glance  65 Feb 15 23:46 ceph.client.glance.keyring
-rw-r--r-- 1 cinder cinder  65 Feb 15 23:47 ceph.client.cinder.keyring
-rw-r--r-- 1 cinder cinder  72 Feb 15 23:47
ceph.client.cinder-backup.keyring

I am really stuck and tried a lot. What Could possibly I be doing wrong.


HELP.


Thanks and Regards
Ashish Chandra
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Poor performance with 2 million files flat

2014-02-15 Thread Samuel Terburg - Panther-IT BV

I have a performance problem i would like advise.

I have the following sub-optimal setup:
* 2 Servers (WTFM008 WTFM009)
  * HP Proliant DL180
* SmartArray G6 P410 raid-controller
* 4x 500GB RAID5   (seq writes = 230MB/s)
* CentOS 6.5 x86_64
* 2.000.000 files (ms-word), with no directory structure
* Ceph
  * ceph-deploy mon create WTFM008 WTFM009
  * ceph-deploy mds create WTFM008 WTFM009
  * ceph-deploy osd activate WTFM008:/var/lib/ceph/osd/ceph-0 
WTFM009:/var/lib/ceph/osd/ceph-1

(osd is using root fs)
  * ceph-fuse /mnt/ceph

I am currently trying to copy 2 million ms-word documents into ceph.
When i started it was doing about 10 files per second.
Now, 1 week later, it has done about 500.000 files and has slowed down 
to 1 file per 10 seconds.


How can i improve this terrible performance?
* The hardware is a fixed configuration, i cannot add (SSD) disks or 
change RAID.

* I could not find the cephfs kernel module so i had to use cephfs-fuse.
* I could have started with a degraded setup (1 OSD) for the initial load,
  would that have helped in the performance? (Ceph not having to do the 
distribution part)

* There is nu load on the systems at all (not cpu, not mem, not disk i/o)

Below is my crush map.


Regards,


Samuel Terburg
Panther-IT BV



# begin crush map

# devices
device 0 osd.0
device 1 osd.1

# types
type 0 osd
type 1 host
type 2 rack
type 3 row
type 4 room
type 5 datacenter
type 6 root

# buckets
host WTFM008 {
   id -2# do not change unnecessarily
   # weight 1.340
   alg straw
   hash 0   # rjenkins1
   item osd.0 weight 1.340
}
host WTFM009 {
   id -3# do not change unnecessarily
   # weight 1.340
   alg straw
   hash 0   # rjenkins1
   item osd.1 weight 1.340
}
root default {
   id -1# do not change unnecessarily
   # weight 2.680
   alg straw
   hash 0   # rjenkins1
   item WTFM008 weight 1.340
   item WTFM009 weight 1.340
}

# rules
rule data {
   ruleset 0
   type replicated
   min_size 1
   max_size 10
   step take default
   step chooseleaf firstn 0 type host
   step emit
}
rule metadata {
   ruleset 1
   type replicated
   min_size 1
   max_size 10
   step take default
   step chooseleaf firstn 0 type host
   step emit
}
rule rbd {
   ruleset 2
   type replicated
   min_size 1
   max_size 10
   step take default
   step chooseleaf firstn 0 type host
   step emit
}

# end crush map



# ceph -w
cluster 4f7bcb26-0cee-4472-abca-c200a999b686
 health HEALTH_OK
 monmap e1: 2 mons at 
{WTFM008=192.168.0.1:6789/0,WTFM009=192.168.0.2:6789/0}, election epoch 
4, quorum 0,1 WTFM008,WTFM009

 mdsmap e5: 1/1/1 up {0=WTFM008=up:active}, 1 up:standby
 osdmap e14: 2 osds: 2 up, 2 in
  pgmap v151668: 192 pgs, 3 pools, 31616 MB data, 956 kobjects
913 GB used, 1686 GB / 2738 GB avail
 192 active+clean
  client io 40892 kB/s rd, 7370 B/s wr, 1 op/s



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Poor performance with 2 million files flat

2014-02-15 Thread Sage Weil
On Sat, 15 Feb 2014, Samuel Terburg - Panther-IT BV wrote:
 I have a performance problem i would like advise.
 
 I have the following sub-optimal setup:
 * 2 Servers (WTFM008 WTFM009)
   * HP Proliant DL180
     * SmartArray G6 P410 raid-controller
     * 4x 500GB RAID5   (seq writes = 230MB/s)
 * CentOS 6.5 x86_64
 * 2.000.000 files (ms-word), with no directory structure
 * Ceph
   * ceph-deploy mon create WTFM008 WTFM009
   * ceph-deploy mds create WTFM008 WTFM009
   * ceph-deploy osd activate WTFM008:/var/lib/ceph/osd/ceph-0
 WTFM009:/var/lib/ceph/osd/ceph-1
     (osd is using root fs)
   * ceph-fuse /mnt/ceph
 
 I am currently trying to copy 2 million ms-word documents into ceph.
 When i started it was doing about 10 files per second.
 Now, 1 week later, it has done about 500.000 files and has slowed down to 1
 file per 10 seconds.
 
 How can i improve this terrible performance?

You probably need to add

mds frag = true

in the [mds] section

sage



 * The hardware is a fixed configuration, i cannot add (SSD) disks or change
 RAID.
 * I could not find the cephfs kernel module so i had to use cephfs-fuse.
 * I could have started with a degraded setup (1 OSD) for the initial load,
   would that have helped in the performance? (Ceph not having to do the
 distribution part)
 * There is nu load on the systems at all (not cpu, not mem, not disk i/o)
 
 Below is my crush map.
 
 
 Regards,
 
 
 Samuel Terburg
 Panther-IT BV
 
 
 
 # begin crush map
 
 # devices
 device 0 osd.0
 device 1 osd.1
 
 # types
 type 0 osd
 type 1 host
 type 2 rack
 type 3 row
 type 4 room
 type 5 datacenter
 type 6 root
 
 # buckets
 host WTFM008 {
    id -2    # do not change unnecessarily
    # weight 1.340
    alg straw
    hash 0   # rjenkins1
    item osd.0 weight 1.340
 }
 host WTFM009 {
    id -3    # do not change unnecessarily
    # weight 1.340
    alg straw
    hash 0   # rjenkins1
    item osd.1 weight 1.340
 }
 root default {
    id -1    # do not change unnecessarily
    # weight 2.680
    alg straw
    hash 0   # rjenkins1
    item WTFM008 weight 1.340
    item WTFM009 weight 1.340
 }
 
 # rules
 rule data {
    ruleset 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
 }
 rule metadata {
    ruleset 1
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
 }
 rule rbd {
    ruleset 2
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
 }
 
 # end crush map
 
 
 
 # ceph -w
     cluster 4f7bcb26-0cee-4472-abca-c200a999b686
  health HEALTH_OK
  monmap e1: 2 mons at
 {WTFM008=192.168.0.1:6789/0,WTFM009=192.168.0.2:6789/0}, election epoch 4,
 quorum 0,1 WTFM008,WTFM009
  mdsmap e5: 1/1/1 up {0=WTFM008=up:active}, 1 up:standby
  osdmap e14: 2 osds: 2 up, 2 in
   pgmap v151668: 192 pgs, 3 pools, 31616 MB data, 956 kobjects
     913 GB used, 1686 GB / 2738 GB avail
  192 active+clean
   client io 40892 kB/s rd, 7370 B/s wr, 1 op/s
 
 
 
 
 ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] slow requests from rados bench with small writes

2014-02-15 Thread Dan van der Ster
Dear Ceph experts,

We've found that a single client running rados bench can drive other
users, ex. RBD users, into slow requests.

Starting with a cluster that is not particularly busy, e.g. :

2014-02-15 23:14:33.714085 mon.0 xx:6789/0 725224 : [INF] pgmap
v6561996: 27952 pgs: 27952 active+clean; 66303 GB data, 224 TB used,
2850 TB / 3075 TB avail; 4880KB
/s rd, 28632KB/s wr, 271op/s

We then start a rados bench writing many small objects:
   rados bench -p test 60 write -t 500 -b 1024 --no-cleanup

which gives these results (note the 60s max latency!!):

Total time run: 86.351424
Total writes made:  91425
Write size: 1024
Bandwidth (MB/sec): 1.034
Stddev Bandwidth:   1.26486
Max bandwidth (MB/sec): 7.14941
Min bandwidth (MB/sec): 0
Average Latency:0.464847
Stddev Latency: 3.04961
Max latency:66.4363
Min latency:0.003188

30 seconds into this bench we start seeing slow requests, not only
from bench writes but also some poor RBD clients, e.g.:

2014-02-15 23:16:02.820507 osd.483 xx:6804/46799 2201 : [WRN] slow
request 30.195634 seconds old, received at 2014-02-15 23:15:32.624641:
osd_sub_op(client.18535427.0:3922272 4.d42
4eb00d42/rbd_data.11371325138b774.6577/head//4 [] v
42083'71453 snapset=0=[]:[] snapc=0=[]) v7 currently commit sent

During a longer, many-hour instance of this small write test, some of
these RBD slow writes became very user visible, with disk flushes
being blocked long enough (120s) for the VM kernels to start
complaining.

A rados bench from a 10Gig-e client writing 4MB objects doesn't have
the same long tail of latency, namely:

# rados bench -p test 60 write -t 500 --no-cleanup
...
Total time run: 62.811466
Total writes made:  8553
Write size: 4194304
Bandwidth (MB/sec): 544.678

Stddev Bandwidth:   173.163
Max bandwidth (MB/sec): 1000
Min bandwidth (MB/sec): 0
Average Latency:3.50719
Stddev Latency: 0.309876
Max latency:8.04493
Min latency:0.166138

and there are zero slow requests, at least during this 60s duration.

While the vast majority of small writes are completing with a
reasonable sub-second latency, what is causing the very long tail seen
by a few writes?? -- 60-120s!! Can someone advise us where to look in
the perf dump, etc... to find which resource/queue is being exhausted
during these tests?

Oh yeah, we're running latest dumpling stable, 0.67.5, on the servers.

Best Regards, Thanks in advance!
Dan

-- Dan van der Ster || Data  Storage Services || CERN IT Department --
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] change order of an rbd image ?

2014-02-15 Thread Daniel Schwager
Hi,

I created a 1TB rbd-image formated with vmfs (vmware) for an ESX server - but 
with a wrong order (25 instead of 22 ...). The rbd man page tells me for 
export/import/cp, rbd will use the order of the source image.

Is there a way to change the order of a rbd image by doing some conversion? Ok 
- one idea could be to 'dd' the 1TB mapped rbd device to same mounted 
filesystem - but is this the only way?
  
best regards
Danny


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Block Devices and OpenStack

2014-02-15 Thread Ashish Chandra
Hi Jean,

Here is the output for ceph auth list for client.cinder

client.cinder
key: AQCKaP9ScNgiMBAAwWjFnyL69rBfMzQRSHOfoQ==
caps: [mon] allow r
caps: [osd] allow class-read object_prefix rbd_children, allow rwx
pool=volumes, allow rx pool=images


Here is the output of ceph -s:

ashish@ceph-client:~$ ceph -s
cluster afa13fcd-f662-4778-8389-85047645d034
 health HEALTH_OK
 monmap e1: 1 mons at {ceph-node1=10.0.1.11:6789/0}, election epoch 1,
quorum 0 ceph-node1
 osdmap e37: 3 osds: 3 up, 3 in
  pgmap v84: 576 pgs, 6 pools, 0 bytes data, 0 objects
106 MB used, 9076 MB / 9182 MB avail
 576 active+clean

I created all the keyrings and copied as suggested by the guide.






On Sun, Feb 16, 2014 at 3:08 AM, Jean-Charles LOPEZ jc.lo...@inktank.comwrote:

 Hi,

 what do you get when you run a 'ceph auth list' command for the user name
 (client.cinder) you created for cinder? Are the caps and the key for this
 user correct? No typo in the hostname in the cinder.conf file (host=) ? Did
 you copy the keyring to the cinder running cinder (can't really say from
 your output and there is no ceph-s command to check the monitor names)?

 It could just be a typo in the ceph auth get-or-create command that's
 causing it.

 Rgds
 JC



 On Feb 15, 2014, at 10:35, Ashish Chandra mail.ashishchan...@gmail.com
 wrote:

 Hi Cephers,

 I am trying to configure ceph rbd as backend for cinder and glance by
 following the steps mentioned in:

 http://ceph.com/docs/master/rbd/rbd-openstack/

 Before I start all openstack services are running normally and ceph
 cluster health shows HEALTH_OK

 But once I am done with all steps and restart openstack services,
 cinder-volume fails to start and throws an error.

 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd Traceback (most
 recent call last):
 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd   File
 /opt/stack/cinder/cinder/volume/drivers/rbd.py, line 262, in
 check_for_setup_error
 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd with
 RADOSClient(self):
 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd   File
 /opt/stack/cinder/cinder/volume/drivers/rbd.py, line 234, in __init__
 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd self.cluster,
 self.ioctx = driver._connect_to_rados(pool)
 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd   File
 /opt/stack/cinder/cinder/volume/drivers/rbd.py, line 282, in
 _connect_to_rados
 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd
 client.connect()
 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd   File
 /usr/lib/python2.7/dist-packages/rados.py, line 185, in connect
 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd raise
 make_ex(ret, error calling connect)
 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd Error: error
 calling connect: error code 95
 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd
 2014-02-16 00:01:42.591 ERROR cinder.volume.manager
 [req-8134a4d7-53f8-4ada-b4b5-4d96d7cad4bc None None] Error encountered
 during initialization of driver: RBDDriver
 2014-02-16 00:01:42.592 ERROR cinder.volume.manager
 [req-8134a4d7-53f8-4ada-b4b5-4d96d7cad4bc None None] Bad or unexpected
 response from the storage volume backend API: error connecting to ceph
 cluster
 2014-02-16 00:01:42.592 TRACE cinder.volume.manager Traceback (most recent
 call last):
 2014-02-16 00:01:42.592 TRACE cinder.volume.manager   File
 /opt/stack/cinder/cinder/volume/manager.py, line 190, in init_host
 2014-02-16 00:01:42.592 TRACE cinder.volume.manager
 self.driver.check_for_setup_error()
 2014-02-16 00:01:42.592 TRACE cinder.volume.manager   File
 /opt/stack/cinder/cinder/volume/drivers/rbd.py, line 267, in
 check_for_setup_error
 2014-02-16 00:01:42.592 TRACE cinder.volume.manager raise
 exception.VolumeBackendAPIException(data=msg)
 2014-02-16 00:01:42.592 TRACE cinder.volume.manager
 VolumeBackendAPIException: Bad or unexpected response from the storage
 volume backend API: error connecting to ceph cluster


 Here is the content of my /etc/ceph in openstack node:

 ashish@ubuntu:/etc/ceph$ ls -lrt
 total 16
 -rw-r--r-- 1 cinder cinder 229 Feb 15 23:45 ceph.conf
 -rw-r--r-- 1 glance glance  65 Feb 15 23:46 ceph.client.glance.keyring
 -rw-r--r-- 1 cinder cinder  65 Feb 15 23:47 ceph.client.cinder.keyring
 -rw-r--r-- 1 cinder cinder  72 Feb 15 23:47
 ceph.client.cinder-backup.keyring

 I am really stuck and tried a lot. What Could possibly I be doing wrong.


 HELP.


 Thanks and Regards
 Ashish Chandra

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] keyring generation

2014-02-15 Thread Kei.masumoto

(2014/02/16 3:06), Kei.masumoto wrote:

(2014/02/11 23:02), Alfredo Deza wrote:
On Tue, Feb 11, 2014 at 7:57 AM, Kei.masumoto 
kei.masum...@gmail.com wrote:

(2014/02/10 23:33), Alfredo Deza wrote:

On Sat, Feb 8, 2014 at 7:56 AM, Kei.masumoto kei.masum...@gmail.com
wrote:

(2014/02/05 23:49), Alfredo Deza wrote:
On Mon, Feb 3, 2014 at 11:28 AM, Kei.masumoto 
kei.masum...@gmail.com

wrote:

Hi Alfredo,

Thanks for your reply!

I think I pasted all logs from ceph.log,  but anyway, I re-executed
ceph-deploy mon create-initial again
Does that make sense? It seems like stack strace are added...

Those seem bad enough. There is a ticket open for these type of
tracebacks that should be gone with the
up coming release of ceph-deploy.

Your monitor does seem like in a good state. Have you checked the
monitor logs to see if they are complaining
about something?

I would also raise the log level in ceph.conf for the monitors
specifically to:

 debug mon = 10


Thanks for your reply.
I did debug mon = 10, but I could not find any error logs in
/var/log/ceph/ceph-mon.ceph1.log.
So I tried to let ceph-create-keys generate logs to files, and 
inserted

log
by myself for debugging purpose.
Then I found get_key() in ceph-create-keys complains like below. ( 
first

line is inserted by me)


INFO:ceph-create-keys: ceph --cluster=ceph --name=mon.
--keyring=/var/lib/ceph/mon/ceph-ceph1/keyring auth get-or-create
client.admin mon allow * osd allow * mds allow
INFO:ceph-create-keys:Talking to monitor...
INFO:ceph-create-keys:Cannot get or create admin key, permission 
denied
How did you started ceph-create-keys? with root? or with the ceph 
user?


That process is usually fired up by the init script that is usually
called by root.
I just use  ceph-deploy --overwrite-conf mon create-initial by 
user ceph.

Before doing that, start ceph-all  stop ceph-all by root.
That is odd, if this is a new cluster why are you starting and 
stopping ?


It seems that you are at a point where you have tried a few things and
the cluster
setup might not be in a good state.

Can you try setting it up from scratch and make sure you keep logs and
output? If you can
replicate your issues consistently (I have tried and cannot) then it
might indicate an issue
and all the logs and how you got there would be super useful


I tried from scratch and logs are attached.
Currently, 4 hosts exists in my test environment, ceph5(remote-host), 
ceph4(mon), ceph3(osd), ceph2(osd). I try to follow the instruction, 
although hostname is little different.
http://ceph.com/docs/master/start/quick-start-preflight/#ceph-node-setup 

Please see at the end of cons...@ceph5.log. After exec ceph-deploy 
mon create-initial, I got same error.

Although I will check little more detail, I appreciate any hints.


I understand my problem. According to the instruction below,
http://ceph.com/docs/master/start/quick-start-preflight/#ceph-node-setup
When I write public network in ceph.conf, mon_host has to included 
the subnet described public network.

I didnt realize such a pre-condition, have to learn more.

After I changed /usr/bin/ceph like below,
conf_defaults = {
-'log_to_stderr':'true',
-'err_to_stderr':'true',
+   'log_to_syslog':'true',
'log_flush_on_exit':'true',
}

I found a logs in /var/log/syslog.
2014-02-15 23:31:50.417381 7f22f8626700  0 -- :/1009957  
192.168.40.136:6789/0 pipe(0x7f22e8019850 sd=4 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f22e8000c00).faul


Listened ipaddr is different what netstat shows.

Thanks for your help so far.










BTW, I inserted debugging logs to /usr/bin/ceph, and found the below 
log.



INFO:debug:Exception error calling connect
INFO:debug:Exception error calling connect TimedOut

Those logs are generated by cluster_handle.connect().  e.g.
rados.Rados.connect().


try:
if childargs and childargs[0] == 'ping':
return ping_monitor(cluster_handle, childargs[1])
cluster_handle.connect(timeout=timeout)

Any hints where to check? 6789 is listened by mon.




So I tried,

root@ceph1:~/my-cluster# chown -R ceph:ceph /var/lib/ceph/mon/
root@ceph1:~/my-cluster# start ceph-all  stop ceph-all
ceph-all start/running
ceph-all stop/waiting

Then I re-tried:
ceph@ceph1:~/my-cluster$ ceph-deploy --overwrite-conf mon 
create-initial
After that, I found some files are still owned b root. Is this a 
correct

behavior?

root@ceph1:~/my-cluster# ls -l /var/lib/ceph/mon/ceph-ceph1/store.db
total 1184
-rw-r--r-- 1 ceph ceph 1081168 Feb  8 02:25 000133.sst
-rw-r--r-- 1 ceph ceph   25530 Feb  8 02:38 000135.sst
-rw-r--r-- 1 ceph ceph   25530 Feb  8 02:38 000138.sst
-rw-r--r-- 1 root root   25530 Feb  8 02:44 000141.sst
-rw-r--r-- 1 root root   65536 Feb  8 02:44 000142.log
-rw-r--r-- 1 root root  16 Feb  8 02:44 CURRENT
-rw-r--r-- 1 ceph ceph   0 Jan 26 05:50 LOCK
-rw-r--r-- 1 ceph ceph 315 Jan 26 06:28 LOG
-rw-r--r-- 1 ceph ceph  57 Jan 26 05:50 LOG.old
-rw-r--r-- 1 root root   65536 Feb  8 02:44