[ceph-users] can not add osd

2014-12-15 Thread yang . bin18
hi

When i execute "ceph-deploy osd prepare node3:/dev/sdb",always come out 
err like this :

[node3][WARNIN] INFO:ceph-disk:Running command: /bin/umount -- 
/var/lib/ceph/tmp/mnt.u2KXW3
[node3][WARNIN] umount: /var/lib/ceph/tmp/mnt.u2KXW3: target is busy.

Then i execute "/bin/umount -- /var/lib/ceph/tmp/mnt.u2KXW3",result is ok.

ZTE Information Security Notice: The information contained in this mail (and 
any attachment transmitted herewith) is privileged and confidential and is 
intended for the exclusive use of the addressee(s).  If you are not an intended 
recipient, any disclosure, reproduction, distribution or other dissemination or 
use of the information contained is strictly prohibited.  If you have received 
this mail in error, please delete it and notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Number of SSD for OSD journal

2014-12-15 Thread Daniel Schwager
Hallo Mike,

> This is also have another way.
> * for CONF 2,3 replace 200Gb SSD to 800Gb and add another 1-2 SSD to
> each node.
> * make tier1 read-write cache on SSDs
> * also you can add journal partition on them if you wish - then data
> will moving from SSD to SSD before let down on HDD
> * on HDD you can make erasure pool or replica pool

Do you have some experience (performance ?)  with SSD as caching tier1? Maybe 
some small benchmarks? From the mailing list, I "feel" that SSD-tearing is not 
much used in productive.

regards
Danny




smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Placing Different Pools on Different OSDS

2014-12-15 Thread Yujian Peng
Hi,

I want to test ceph cache tire. The test cluster has three machines, each
has a ssd and a sata. I've created a crush rule ssd_ruleset to place ssdpool
on ssd osd, but cannot assign pgs to ssds.


root@ceph10:~# ceph osd crush rule list
[
"replicated_ruleset",
"ssd_ruleset"]
root@ceph10:~# ceph status
cluster fa5427de-b0d7-466a-b7cf-90e47eac1642
 health HEALTH_OK
 monmap e2: 2 mons at
{mona=192.168.2.10:6789/0,monb=192.168.2.11:6789/0}, election epoch 4,
quorum 0,1 mona,monb
 osdmap e70: 6 osds: 6 up, 6 in
  pgmap v234: 192 pgs, 3 pools, 0 bytes data, 0 objects
251 MB used, 5853 GB / 5853 GB avail
 192 active+clean
root@ceph10:~# ceph osd pool create ssdpool 128 128 replicated ssd_ruleset
pool 'ssdpool' created
root@ceph10:~# ceph osd pool get ssdpool crush_ruleset
crush_ruleset: 0
root@ceph10:~# ceph osd pool set ssdpool crush_ruleset 1
set pool 8 crush_ruleset to 1
root@ceph10:~# ceph status
cluster fa5427de-b0d7-466a-b7cf-90e47eac1642
 health HEALTH_OK
 monmap e2: 2 mons at
{mona=192.168.2.10:6789/0,monb=192.168.2.11:6789/0}, election epoch 4,
quorum 0,1 mona,monb
 osdmap e73: 6 osds: 6 up, 6 in
  pgmap v245: 320 pgs, 4 pools, 0 bytes data, 0 objects
4857 MB used, 5849 GB / 5853 GB avail
 320 active+clean
root@ceph10:/var/log/ceph# rbd list -p ssdpool
^C
root@ceph10:/var/log/ceph# rbd create test --pool ssdpool --size 1024
--image-format 2
^C

The command "rbd list -p ssdpool" and "rbd create test --pool ssdpool --size
1024 --image-format 2" hung.

"ceph pg dump" showed me that no pgs are on ssd osds. Why ?
Why did "ceph osd pool create ssdpool 128 128 replicated ssd_ruleset" create
ssdpool with crush_ruleset 0?
How to set ssd_ruleset when create a pool?


I used the ceph command to control crush map.

ceph osd crush add-bucket ssd root
ceph osd crush add-bucket ssd10 host
ceph osd crush add-bucket ssd11 host
ceph osd crush add-bucket ssd12 host
ceph osd crush move ssd10 root=ssd
ceph osd crush move ssd11 root=ssd
ceph osd crush move ssd11 root=ssd
ceph osd crush rule create-simple ssd_ruleset ssd root

ceph osd crush add-bucket sata10 host
ceph osd crush add-bucket sata11 host
ceph osd crush add-bucket sata12 host
ceph osd crush move sata10 root=default
ceph osd crush move sata11 root=default
ceph osd crush move sata12 root=default

Here is my crush map:

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host sata10 {
id -6   # do not change unnecessarily
# weight 1.900
alg straw
hash 0  # rjenkins1
item osd.3 weight 1.900
}
host sata11 {
id -7   # do not change unnecessarily
# weight 1.900
alg straw
hash 0  # rjenkins1
item osd.4 weight 1.900
}
host sata12 {
id -8   # do not change unnecessarily
# weight 1.900
alg straw
hash 0  # rjenkins1
item osd.5 weight 1.900
}
root default {
id -1   # do not change unnecessarily
# weight 5.700
alg straw
hash 0  # rjenkins1
item sata10 weight 1.900
item sata11 weight 1.900
item sata12 weight 1.900
}
host ssd10 {
id -2   # do not change unnecessarily
# weight 1.900
alg straw
hash 0  # rjenkins1
item osd.0 weight 1.900
}
host ssd11 {
id -4   # do not change unnecessarily
# weight 1.900
alg straw
hash 0  # rjenkins1
item osd.1 weight 1.900
}
host ssd12 {
id -5   # do not change unnecessarily
# weight 1.900
alg straw
hash 0  # rjenkins1
item osd.2 weight 1.900
}
root ssd {
id -3   # do not change unnecessarily
# weight 5.700
alg straw
hash 0  # rjenkins1
item ssd10 weight 1.900
item ssd11 weight 1.900
item ssd12 weight 1.900
}

# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule ssd_ruleset {
ruleset 1
type replicated
min_size 1
max_size 10
step take ssd
step chooseleaf firstn 0 type root
step emit
}

# end crush map

Thanks a lot!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Number of SSD for OSD journal

2014-12-15 Thread Mike
15.12.2014 23:45, Sebastien Han пишет:
> Salut,
> 
> The general recommended ratio (for me at least) is 3 journals per SSD. Using 
> 200GB Intel DC S3700 is great.
> If you’re going with a low perf scenario I don’t think you should bother 
> buying SSD, just remove them from the picture and do 12 SATA 7.2K 4TB.
> 
> For medium and medium ++ perf using a ratio 1:11 is way to high, the SSD will 
> definitely be the bottleneck here.
> Please also note that (bandwidth wise) with 22 drives you’re already hitting 
> the theoretical limit of a 10Gbps network. (~50MB/s * 22 ~= 1.1Gbps).
> You can theoretically up that value with LACP (depending on the 
> xmit_hash_policy you’re using of course).
> 
> Btw what’s the network? (since I’m only assuming here).
> 
> 
>> On 15 Dec 2014, at 20:44, Florent MONTHEL  wrote:
>>
>> Hi,
>>
>> I’m buying several servers to test CEPH and I would like to configure 
>> journal on SSD drives (maybe it’s not necessary for all use cases)
>> Could you help me to identify number of SSD I need (SSD are very expensive 
>> and GB price business case killer… ) ? I don’t want to experience SSD 
>> bottleneck (some abacus ?).
>> I think I will be with below CONF 2 & 3
>>
>>
>> CONF 1 DELL 730XC "Low Perf":
>> 10 SATA 7.2K 3.5  4TB + 2 SSD 2.5 » 200GB "intensive write"
>>
>> CONF 2 DELL 730XC « Medium Perf" :
>> 22 SATA 7.2K 2.5 1TB + 2 SSD 2.5 » 200GB "intensive write"
>>
>> CONF 3 DELL 730XC « Medium Perf ++" :
>> 22 SAS 10K 2.5 1TB + 2 SSD 2.5 » 200GB "intensive write"
>>
>> Thanks
>>
>> Florent Monthel
>>

This is also have another way.
* for CONF 2,3 replace 200Gb SSD to 800Gb and add another 1-2 SSD to
each node.
* make tier1 read-write cache on SSDs
* also you can add journal partition on them if you wish - then data
will moving from SSD to SSD before let down on HDD
* on HDD you can make erasure pool or replica pool

You have 10Gbit Eth, 4 SSD that also used for journals - then you may
will have bottleneck in NIC, than in future easy avoid of replace NIC.

In my opinion, backend network must be equivalent or faster then
frontend one, because time spend for balance cluster it very important,
and must be very low, to aim to zero.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to download files from ceph radosgw node using openstack juno swift client.

2014-12-15 Thread pushpesh sharma
Vivek,

The problem is swift client is only downloading a chunk of object not
the whole object so the etag mismatch. Could you paste the value of
'rgw_max_chunk_size'. Please be sure you set this to a sane
value(<4MB, atleast for Giant release this works below this value).



On Tue, Dec 16, 2014 at 12:26 PM, Vivek Varghese Cherian
 wrote:
> Hi,
>
> I am integrating ceph firefly radosgw with openstack juno keystone, the
> operating system used on the ceph
> nodes and on the openstack node is Ubuntu 14.04.
>
> I am able to create containers and upload files using the swift client to
> ceph.
>
> But when I try to download files, I am getting the following error,
>
> ppmuser@ppm-dc-c3sv3-ju:~$ swift --verbose --debug -V 1 -A
> http://10.x.x.126/auth -U swift:swift -K  download demo-container1
> file_12345.txt
>
> INFO:urllib3.connectionpool:Starting new HTTP connection (1): 10.x.x.126
> DEBUG:urllib3.connectionpool:Setting read timeout to None
> DEBUG:urllib3.connectionpool:"GET /auth HTTP/1.1" 204 0
> DEBUG:swiftclient:REQ: curl -i http://10.x.x.126/auth -X GET
> DEBUG:swiftclient:RESP STATUS: 204 No Content
> DEBUG:swiftclient:RESP HEADERS: [('x-auth-token',
> 'AUTH_rgwtk0b0073776966743a73776966740484a024256146dc5719915468deb431889a0ff3c6514ac8ae2388abdfbaac6d262c3e3e'),
> ('x-storage-token',
> 'AUTH_rgwtk0b0073776966743a73776966740484a024256146dc5719915468deb431889a0ff3c6514ac8ae2388abdfbaac6d262c3e3e'),
> ('date', 'Tue, 16 Dec 2014 05:49:11 GMT'), ('x-storage-url',
> 'http://10.x.x.126/swift/v1'), ('server', 'Apache/2.4.7 (Ubuntu)'),
> ('content-type', 'application/json')]
> INFO:urllib3.connectionpool:Starting new HTTP connection (1): 10.x.x.126
> DEBUG:urllib3.connectionpool:Setting read timeout to None
> DEBUG:urllib3.connectionpool:"GET /swift/v1/demo-container1/file_12345.txt
> HTTP/1.1" 200 14
> DEBUG:swiftclient:REQ: curl -i
> http://10.x.x.126/swift/v1/demo-container1/file_12345.txt -X GET -H
> "X-Auth-Token:
> AUTH_rgwtk0b0073776966743a73776966740484a024256146dc5719915468deb431889a0ff3c6514ac8ae2388abdfbaac6d262c3e3e"
> DEBUG:swiftclient:RESP STATUS: 200 OK
> DEBUG:swiftclient:RESP HEADERS: [('content-length', '14'), ('accept-ranges',
> 'bytes'), ('server', 'Apache/2.4.7 (Ubuntu)'), ('last-modified', 'Mon, 15
> Dec 2014 13:35:37 GMT'), ('etag', '94b371af646d4c53d8f3f8ff7be74dbb'),
> ('date', 'Tue, 16 Dec 2014 05:49:11 GMT'), ('x-object-meta-mtime',
> '1418650375.550963')]
>
> Error downloading object 'demo-container1/file_12345.txt': 'Error
> downloading file_12345.txt: read_length != content_length, 0 != 14'
>
> ppmuser@ppm-dc-c3sv3-ju:~
>
>
> The apache2 error and access logs on the radosgw ceph node is as follows,
>
> root@ppm-c240-ceph3:~# tail -f /var/log/apache2/error.log
>
> [Tue Dec 16 01:10:20.735061 2014] [fastcgi:error] [pid 31426:tid
> 140231210624768] [client 10.x.x.175:55902] FastCGI: comm with server
> "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)
> [Tue Dec 16 01:10:20.735134 2014] [fastcgi:error] [pid 31426:tid
> 140231210624768] [client 10.x.x.175:55902] FastCGI: incomplete headers (0
> bytes) received from server "/var/www/s3gw.fcgi"
> [Tue Dec 16 01:10:20.735161 2014] [http:error] [pid 31426:tid
> 140231210624768] [client 10.x.x.175:55902] not unsetting Content-Length on
> HEAD response (rgw changes)\n
> [Tue Dec 16 01:10:20.857215 2014] [:warn] [pid 31426:tid 140231202232064]
> FastCGI: 10.x.x.175 HEAD
> http://ppm-c240-ceph3.cisco.com/swift/v1/AUTH_25bb0caaff834efdafa1c1fcbb6aaf93
> auth
> [Tue Dec 16 01:10:50.887381 2014] [fastcgi:error] [pid 31426:tid
> 140231202232064] [client 10.x.x.175:55910] FastCGI: comm with server
> "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)
> [Tue Dec 16 01:10:50.887454 2014] [fastcgi:error] [pid 31426:tid
> 140231202232064] [client 10.x.x.175:55910] FastCGI: incomplete headers (0
> bytes) received from server "/var/www/s3gw.fcgi"
> [Tue Dec 16 01:10:50.887481 2014] [http:error] [pid 31426:tid
> 140231202232064] [client 10.x.x.175:55910] not unsetting Content-Length on
> HEAD response (rgw changes)\n
> [Tue Dec 16 01:10:50.898281 2014] [:warn] [pid 31426:tid 140231193839360]
> FastCGI: 10.x.x.175 GET
> http://ppm-c240-ceph3.cisco.com/swift/v1/AUTH_25bb0caaff834efdafa1c1fcbb6aaf93
> auth
> [Tue Dec 16 01:11:20.928476 2014] [fastcgi:error] [pid 31426:tid
> 140231193839360] [client 10.x.x.175:55911] FastCGI: comm with server
> "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)
> [Tue Dec 16 01:11:20.928556 2014] [fastcgi:error] [pid 31426:tid
> 140231193839360] [client 10.x.x.175:55911] FastCGI: incomplete headers (0
> bytes) received from server "/var/www/s3gw.fcgi"
>
> root@ppm-c240-ceph3:~# tail -f /var/log/apache2/access.log
>
> 10.81.83.175 - - [16/Dec/2014:01:00:20 -0500] "HEAD
> /swift/v1/AUTH_25bb0caaff834efdafa1c1fcbb6aaf93 HTTP/1.1" 500 189 "-"
> "python-swiftclient-2.3.0"
> 10.81.83.175 - - [16/Dec/2014:01:00:50 -0500] "GET
> /swift/v1/AUTH_25bb0caaff834efdafa1c1fcbb6aaf93?format=json HTTP/1.1" 500
> 719 "-" "pyt

[ceph-users] Unable to download files from ceph radosgw node using openstack juno swift client.

2014-12-15 Thread Vivek Varghese Cherian
Hi,

I am integrating ceph firefly radosgw with openstack juno keystone, the
operating system used on the ceph
nodes and on the openstack node is Ubuntu 14.04.

I am able to create containers and upload files using the swift client to
ceph.

But when I try to download files, I am getting the following error,

ppmuser@ppm-dc-c3sv3-ju:~$ swift --verbose --debug -V 1 -A
http://10.x.x.126/auth -U swift:swift -K  download
demo-container1 file_12345.txt

INFO:urllib3.connectionpool:Starting new HTTP connection (1): 10.x.x.126
DEBUG:urllib3.connectionpool:Setting read timeout to None
DEBUG:urllib3.connectionpool:"GET /auth HTTP/1.1" 204 0
DEBUG:swiftclient:REQ: curl -i http://10.x.x.126/auth -X GET
DEBUG:swiftclient:RESP STATUS: 204 No Content
DEBUG:swiftclient:RESP HEADERS: [('x-auth-token',
'AUTH_rgwtk0b0073776966743a73776966740484a024256146dc5719915468deb431889a0ff3c6514ac8ae2388abdfbaac6d262c3e3e'),
('x-storage-token',
'AUTH_rgwtk0b0073776966743a73776966740484a024256146dc5719915468deb431889a0ff3c6514ac8ae2388abdfbaac6d262c3e3e'),
('date', 'Tue, 16 Dec 2014 05:49:11 GMT'), ('x-storage-url', '
http://10.x.x.126/swift/v1'), ('server', 'Apache/2.4.7 (Ubuntu)'),
('content-type', 'application/json')]
INFO:urllib3.connectionpool:Starting new HTTP connection (1): 10.x.x.126
DEBUG:urllib3.connectionpool:Setting read timeout to None
DEBUG:urllib3.connectionpool:"GET /swift/v1/demo-container1/file_12345.txt
HTTP/1.1" 200 14
DEBUG:swiftclient:REQ: curl -i
http://10.x.x.126/swift/v1/demo-container1/file_12345.txt -X GET -H
"X-Auth-Token:
AUTH_rgwtk0b0073776966743a73776966740484a024256146dc5719915468deb431889a0ff3c6514ac8ae2388abdfbaac6d262c3e3e"
DEBUG:swiftclient:RESP STATUS: 200 OK
DEBUG:swiftclient:RESP HEADERS: [('content-length', '14'),
('accept-ranges', 'bytes'), ('server', 'Apache/2.4.7 (Ubuntu)'),
('last-modified', 'Mon, 15 Dec 2014 13:35:37 GMT'), ('etag',
'94b371af646d4c53d8f3f8ff7be74dbb'), ('date', 'Tue, 16 Dec 2014 05:49:11
GMT'), ('x-object-meta-mtime', '1418650375.550963')]

Error downloading object 'demo-container1/file_12345.txt': 'Error
downloading file_12345.txt: read_length != content_length, 0 != 14'

ppmuser@ppm-dc-c3sv3-ju:~


The apache2 error and access logs on the radosgw ceph node is as follows,

root@ppm-c240-ceph3:~# tail -f /var/log/apache2/error.log

[Tue Dec 16 01:10:20.735061 2014] [fastcgi:error] [pid 31426:tid
140231210624768] [client 10.x.x.175:55902] FastCGI: comm with server
"/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)
[Tue Dec 16 01:10:20.735134 2014] [fastcgi:error] [pid 31426:tid
140231210624768] [client 10.x.x.175:55902] FastCGI: incomplete headers (0
bytes) received from server "/var/www/s3gw.fcgi"
[Tue Dec 16 01:10:20.735161 2014] [http:error] [pid 31426:tid
140231210624768] [client 10.x.x.175:55902] not unsetting Content-Length on
HEAD response (rgw changes)\n
[Tue Dec 16 01:10:20.857215 2014] [:warn] [pid 31426:tid 140231202232064]
FastCGI: 10.x.x.175 HEAD
http://ppm-c240-ceph3.cisco.com/swift/v1/AUTH_25bb0caaff834efdafa1c1fcbb6aaf93
auth
[Tue Dec 16 01:10:50.887381 2014] [fastcgi:error] [pid 31426:tid
140231202232064] [client 10.x.x.175:55910] FastCGI: comm with server
"/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)
[Tue Dec 16 01:10:50.887454 2014] [fastcgi:error] [pid 31426:tid
140231202232064] [client 10.x.x.175:55910] FastCGI: incomplete headers (0
bytes) received from server "/var/www/s3gw.fcgi"
[Tue Dec 16 01:10:50.887481 2014] [http:error] [pid 31426:tid
140231202232064] [client 10.x.x.175:55910] not unsetting Content-Length on
HEAD response (rgw changes)\n
[Tue Dec 16 01:10:50.898281 2014] [:warn] [pid 31426:tid 140231193839360]
FastCGI: 10.x.x.175 GET
http://ppm-c240-ceph3.cisco.com/swift/v1/AUTH_25bb0caaff834efdafa1c1fcbb6aaf93
auth
[Tue Dec 16 01:11:20.928476 2014] [fastcgi:error] [pid 31426:tid
140231193839360] [client 10.x.x.175:55911] FastCGI: comm with server
"/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)
[Tue Dec 16 01:11:20.928556 2014] [fastcgi:error] [pid 31426:tid
140231193839360] [client 10.x.x.175:55911] FastCGI: incomplete headers (0
bytes) received from server "/var/www/s3gw.fcgi"

root@ppm-c240-ceph3:~# tail -f /var/log/apache2/access.log

10.81.83.175 - - [16/Dec/2014:01:00:20 -0500] "HEAD
/swift/v1/AUTH_25bb0caaff834efdafa1c1fcbb6aaf93 HTTP/1.1" 500 189 "-"
"python-swiftclient-2.3.0"
10.81.83.175 - - [16/Dec/2014:01:00:50 -0500] "GET
/swift/v1/AUTH_25bb0caaff834efdafa1c1fcbb6aaf93?format=json HTTP/1.1" 500
719 "-" "python-swiftclient-2.3.0"
10.81.83.175 - - [16/Dec/2014:01:08:22 -0500] "GET /auth HTTP/1.1" 204 431
"-" "python-swiftclient-2.3.0"
10.81.83.175 - - [16/Dec/2014:01:08:22 -0500] "GET
/swift/v1/demo-container1?format=json HTTP/1.1" 200 411 "-"
"python-swiftclient-2.3.0"
10.81.83.175 - - [16/Dec/2014:01:08:22 -0500] "GET
/swift/v1/demo-container1?format=json&marker=file_12345.txt HTTP/1.1" 200
174 "-" "python-swiftclient-2.3.0"
10.81.83.175 - - [16/Dec/2014:01:08:50 -0500] "HEAD
/swift/v1/AUTH_25bb0caaff834efdafa1

Re: [ceph-users] Test 6

2014-12-15 Thread Leen de Braal
If you are trying to see if your mails come through, don't check on the
list. You have a gmail account, gmail removes mails that you have sent
yourself.
You can check the archives to see.

And your mails did come on the list.


> --
> Lindsay
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
L. de Braal
BraHa Systems
NL - Terneuzen
T +31 115 649333

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple issues :( Ubuntu 14.04, latest Ceph

2014-12-15 Thread Benjamin
Hi Udo,

Thanks! Creating the MDS did not add a data and metadata pool for me but I
was able to simply create them myself.

The tutorials also suggest you make new pools, cephfs_data and
cephfs_metadata - would simply using data and metadata work better?

- B

On Mon, Dec 15, 2014, 10:37 PM Udo Lembke  wrote:

>  Hi,
> see here:
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg15546.html
>
> Udo
>
>
> On 16.12.2014 05:39, Benjamin wrote:
>
> I increased the OSDs to 10.5GB each and now I have a different issue...
>
> cephy@ceph-admin0:~/ceph-cluster$ echo {Test-data} > testfile.txt
> cephy@ceph-admin0:~/ceph-cluster$ rados put test-object-1 testfile.txt
> --pool=data
> error opening pool data: (2) No such file or directory
> cephy@ceph-admin0:~/ceph-cluster$ ceph osd lspools
> 0 rbd,
>
>  Here's ceph -w:
> cephy@ceph-admin0:~/ceph-cluster$ ceph -w
> cluster b3e15af-SNIP
>  health HEALTH_WARN mon.ceph0 low disk space; mon.ceph1 low disk
> space; mon.ceph2 low disk space; clock skew detected on mon.ceph0,
> mon.ceph1, mon.ceph2
>  monmap e3: 4 mons at {ceph-admin0=
> 10.0.1.10:6789/0,ceph0=10.0.1.11:6789/0,ceph1=10.0.1.12:6789/0,ceph2=10.0.1.13:6789/0},
> election epoch 10, quorum 0,1,2,3 ceph-admin0,ceph0,ceph1,ceph2
>  osdmap e17: 3 osds: 3 up, 3 in
>   pgmap v36: 64 pgs, 1 pools, 0 bytes data, 0 objects
> 19781 MB used, 7050 MB / 28339 MB avail
>   64 active+clean
>
>  Any other commands to run that would be helpful? Is it safe to simply
> manually create the "data" and "metadata" pools myself?
>
> On Mon, Dec 15, 2014 at 5:07 PM, Benjamin  wrote:
>>
>> Aha, excellent suggestion! I'll try that as soon as I get back, thank you.
>> - B
>>  On Dec 15, 2014 5:06 PM, "Craig Lewis" 
>> wrote:
>>
>>>
>>> On Sun, Dec 14, 2014 at 6:31 PM, Benjamin  wrote:

 The machines each have Ubuntu 14.04 64-bit, with 1GB of RAM and 8GB of
 disk. They have between 10% and 30% disk utilization but common between all
 of them is that they *have free disk space* meaning I have no idea
 what the heck is causing Ceph to complain.

>>>
>>> Each OSD is 8GB?  You need to make them at least 10 GB.
>>>
>>>  Ceph weights each disk as it's size in TiB, and it truncates to two
>>> decimal places.  So your 8 GiB disks have a weight of 0.00.  Bump it up to
>>> 10 GiB, and it'll get a weight of 0.01.
>>>
>>>  You should have 3 OSDs, one for each of ceph0,ceph1,ceph2.
>>>
>>>  If that doesn't fix the problem, go ahead and post the things Udo
>>> mentioned.
>>>
>>
>
> ___
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple issues :( Ubuntu 14.04, latest Ceph

2014-12-15 Thread Udo Lembke
Hi,
see here:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg15546.html

Udo

On 16.12.2014 05:39, Benjamin wrote:
> I increased the OSDs to 10.5GB each and now I have a different issue...
>
> cephy@ceph-admin0:~/ceph-cluster$ echo {Test-data} > testfile.txt
> cephy@ceph-admin0:~/ceph-cluster$ rados put test-object-1 testfile.txt
> --pool=data
> error opening pool data: (2) No such file or directory
> cephy@ceph-admin0:~/ceph-cluster$ ceph osd lspools
> 0 rbd,
>
> Here's ceph -w:
> cephy@ceph-admin0:~/ceph-cluster$ ceph -w
> cluster b3e15af-SNIP
>  health HEALTH_WARN mon.ceph0 low disk space; mon.ceph1 low disk
> space; mon.ceph2 low disk space; clock skew detected on mon.ceph0,
> mon.ceph1, mon.ceph2
>  monmap e3: 4 mons at
> {ceph-admin0=10.0.1.10:6789/0,ceph0=10.0.1.11:6789/0,ceph1=10.0.1.12:6789/0,ceph2=10.0.1.13:6789/0
> },
> election epoch 10, quorum 0,1,2,3 ceph-admin0,ceph0,ceph1,ceph2
>  osdmap e17: 3 osds: 3 up, 3 in
>   pgmap v36: 64 pgs, 1 pools, 0 bytes data, 0 objects
> 19781 MB used, 7050 MB / 28339 MB avail
>   64 active+clean
>
> Any other commands to run that would be helpful? Is it safe to simply
> manually create the "data" and "metadata" pools myself?
>
> On Mon, Dec 15, 2014 at 5:07 PM, Benjamin  > wrote:
>
> Aha, excellent suggestion! I'll try that as soon as I get back,
> thank you.
> - B
>
> On Dec 15, 2014 5:06 PM, "Craig Lewis"  > wrote:
>
>
> On Sun, Dec 14, 2014 at 6:31 PM, Benjamin  > wrote:
>
> The machines each have Ubuntu 14.04 64-bit, with 1GB of
> RAM and 8GB of disk. They have between 10% and 30% disk
> utilization but common between all of them is that they
> *have free disk space* meaning I have no idea what the
> heck is causing Ceph to complain.
>
>
> Each OSD is 8GB?  You need to make them at least 10 GB.
>
> Ceph weights each disk as it's size in TiB, and it truncates
> to two decimal places.  So your 8 GiB disks have a weight of
> 0.00.  Bump it up to 10 GiB, and it'll get a weight of 0.01.
>
> You should have 3 OSDs, one for each of ceph0,ceph1,ceph2.
>
> If that doesn't fix the problem, go ahead and post the things
> Udo mentioned.
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD and HA KVM anybody?

2014-12-15 Thread Josef Johansson
Hi,

> On 16 Dec 2014, at 05:00, Christian Balzer  wrote:
> 
> 
> Hello,
> 
> On Mon, 15 Dec 2014 09:23:23 +0100 Josef Johansson wrote:
> 
>> Hi Christian,
>> 
>> We’re using Proxmox that has support for HA, they do it per-vm.
>> We’re doing it manually right now though, because we like it :). 
>> 
>> When I looked at it I couldn’t see a way of just allowing a set of hosts
>> in the HA (i.e. not the storage nodes), but that’s probably easy to
>> solve.
>> 
> 
> Ah, Proxmox. I test drove this about a year ago and while it has some nice
> features the "black box" approach of taking over bare metal hardware and
> the ancient kernel doesn't mesh with other needs I have here.
The ancient kernel is not needed if you’re running just KVM. They are working 
on a 3.10 kernel if I’m correct though.
As it’s Debian 7 in the bottom now, just put in a back ported kernel and you’re 
good to go. 3.14 was bad but 3.15 should be ok.
And it has Ceph support now a days :)

Cheers,
Josef
> 
> Thanks for reminding me, though.
No problemo :)
> 
> Christian
> 
>> Cheers,
>> Josef
>> 
>>> On 15 Dec 2014, at 04:10, Christian Balzer  wrote:
>>> 
>>> 
>>> Hello,
>>> 
>>> What are people here using to provide HA KVMs (and with that I mean
>>> automatic, fast VM failover in case of host node failure) in with RBD
>>> images?
>>> 
>>> Openstack and ganeti have decent Ceph/RBD support, but no HA (plans
>>> aplenty though).
>>> 
>>> I have plenty of experience with Pacemaker (DRBD backed) but there is
>>> only an unofficial RBD resource agent for it, which also only supports
>>> kernel based RBD. 
>>> And while Pacemaker works great, it scales like leaden porcupines,
>>> things degrade rapidly after 20 or so instances.
>>> 
>>> So what are other people here using to keep their KVM based VMs up and
>>> running all the time?
>>> 
>>> Regards,
>>> 
>>> Christian
>>> -- 
>>> Christian BalzerNetwork/Systems Engineer
>>> ch...@gol.com   Global OnLine Japan/Fusion Communications
>>> http://www.gol.com/
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> 
> 
> 
> -- 
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Fusion 
> Communications
> http://www.gol.com/ 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple issues :( Ubuntu 14.04, latest Ceph

2014-12-15 Thread Benjamin
I increased the OSDs to 10.5GB each and now I have a different issue...

cephy@ceph-admin0:~/ceph-cluster$ echo {Test-data} > testfile.txt
cephy@ceph-admin0:~/ceph-cluster$ rados put test-object-1 testfile.txt
--pool=data
error opening pool data: (2) No such file or directory
cephy@ceph-admin0:~/ceph-cluster$ ceph osd lspools
0 rbd,

Here's ceph -w:
cephy@ceph-admin0:~/ceph-cluster$ ceph -w
cluster b3e15af-SNIP
 health HEALTH_WARN mon.ceph0 low disk space; mon.ceph1 low disk space;
mon.ceph2 low disk space; clock skew detected on mon.ceph0, mon.ceph1,
mon.ceph2
 monmap e3: 4 mons at {ceph-admin0=
10.0.1.10:6789/0,ceph0=10.0.1.11:6789/0,ceph1=10.0.1.12:6789/0,ceph2=10.0.1.13:6789/0},
election epoch 10, quorum 0,1,2,3 ceph-admin0,ceph0,ceph1,ceph2
 osdmap e17: 3 osds: 3 up, 3 in
  pgmap v36: 64 pgs, 1 pools, 0 bytes data, 0 objects
19781 MB used, 7050 MB / 28339 MB avail
  64 active+clean

Any other commands to run that would be helpful? Is it safe to simply
manually create the "data" and "metadata" pools myself?

On Mon, Dec 15, 2014 at 5:07 PM, Benjamin  wrote:
>
> Aha, excellent suggestion! I'll try that as soon as I get back, thank you.
> - B
> On Dec 15, 2014 5:06 PM, "Craig Lewis"  wrote:
>
>>
>> On Sun, Dec 14, 2014 at 6:31 PM, Benjamin  wrote:
>>>
>>> The machines each have Ubuntu 14.04 64-bit, with 1GB of RAM and 8GB of
>>> disk. They have between 10% and 30% disk utilization but common between all
>>> of them is that they *have free disk space* meaning I have no idea what
>>> the heck is causing Ceph to complain.
>>>
>>
>> Each OSD is 8GB?  You need to make them at least 10 GB.
>>
>> Ceph weights each disk as it's size in TiB, and it truncates to two
>> decimal places.  So your 8 GiB disks have a weight of 0.00.  Bump it up to
>> 10 GiB, and it'll get a weight of 0.01.
>>
>> You should have 3 OSDs, one for each of ceph0,ceph1,ceph2.
>>
>> If that doesn't fix the problem, go ahead and post the things Udo
>> mentioned.
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Number of SSD for OSD journal

2014-12-15 Thread Christian Balzer

Hello,

On Mon, 15 Dec 2014 22:43:14 +0100 Florent MONTHEL wrote:

> Thanks all
> 
> I will probably have 2x10gb : 1x10gb for client and 1x10gb for cluster
> but I take in charge your recommendation Sebastien
> 
> The 200GB SSD will probably give me around 500MB/s sequential bandwidth.
Intel DC S3700 200GB are 365MB/s write.

> So with only 2 SSD I can  overload 1x 10gb network.
> 
Unless you have an unlimited budget (which you obviously don't), you have
to balance cost and performance. 
Performance however comes in 2 main flavors here, IOPS and
throughput/bandwidth.

In normal operation, your storage nodes will run out of IOPS long before
they hit the bandwidth limits of your network or storage (both journal SSDs
and HDDs).

During a recovery or other data migration (new OSD) process the bandwidth
becomes a lot more relevant, especially if the cluster isn't otherwise
busy at that time.
Your configuration 1 won't be able to continuously scribble data to the
HDDs faster than the 730MB/s of the 2 Intel SSDs, so it's fine as long as
you keep in mind that one dead (or otherwise unavailable) SSD will take
out 5 OSDs.

Your configuration 2 and 3 will likely benefit from 4 SSDs, not just to
keep the failure domain to sane levels, but also because at least #3
should be able to write faster to the HDDs than 730MB/s. 

And let me chime in here with the "Intel DC S3700 SSDs for journals" crowd.
For a cluster here I wound up using 4 100GB ones (200MB/s write) and 8
HDDs, as that was still very affordable while reducing the failure domain
of one SSD to 2 OSDs.

Research the ML archives, but for your #2 and #3 you will also want PLENTY
of CPU power and RAM (page cache avoids a lot of disk seeks and speeds up
things massively on the read side).

Lastly, I have a SSD-less test cluster and can just nod emphatically to
what Craig wrote.

Christian

> Hum I will take care of osd density
> 
> Sent from my iPhone
> 
> > On 15 déc. 2014, at 21:45, Sebastien Han 
> > wrote:
> > 
> > Salut,
> > 
> > The general recommended ratio (for me at least) is 3 journals per SSD.
> > Using 200GB Intel DC S3700 is great. If you’re going with a low perf
> > scenario I don’t think you should bother buying SSD, just remove them
> > from the picture and do 12 SATA 7.2K 4TB.
> > 
> > For medium and medium ++ perf using a ratio 1:11 is way to high, the
> > SSD will definitely be the bottleneck here. Please also note that
> > (bandwidth wise) with 22 drives you’re already hitting the theoretical
> > limit of a 10Gbps network. (~50MB/s * 22 ~= 1.1Gbps). You can
> > theoretically up that value with LACP (depending on the
> > xmit_hash_policy you’re using of course).
> > 
> > Btw what’s the network? (since I’m only assuming here).
> > 
> > 
> >> On 15 Dec 2014, at 20:44, Florent MONTHEL 
> >> wrote:
> >> 
> >> Hi,
> >> 
> >> I’m buying several servers to test CEPH and I would like to configure
> >> journal on SSD drives (maybe it’s not necessary for all use cases)
> >> Could you help me to identify number of SSD I need (SSD are very
> >> expensive and GB price business case killer… ) ? I don’t want to
> >> experience SSD bottleneck (some abacus ?). I think I will be with
> >> below CONF 2 & 3
> >> 
> >> 
> >> CONF 1 DELL 730XC "Low Perf":
> >> 10 SATA 7.2K 3.5  4TB + 2 SSD 2.5 » 200GB "intensive write"
> >> 
> >> CONF 2 DELL 730XC « Medium Perf" :
> >> 22 SATA 7.2K 2.5 1TB + 2 SSD 2.5 » 200GB "intensive write"
> >> 
> >> CONF 3 DELL 730XC « Medium Perf ++" :
> >> 22 SAS 10K 2.5 1TB + 2 SSD 2.5 » 200GB "intensive write"
> >> 
> >> Thanks
> >> 
> >> Florent Monthel
> >> 
> >> 
> >> 
> >> 
> >> 
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> > 
> > Cheers.
> > 
> > Sébastien Han
> > Cloud Architect
> > 
> > "Always give 100%. Unless you're giving blood."
> > 
> > Phone: +33 (0)1 49 70 99 72
> > Mail: sebastien@enovance.com
> > Address : 11 bis, rue Roquépine - 75008 Paris
> > Web : www.enovance.com - Twitter : @enovance
> > 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running ceph in Deis/Docker

2014-12-15 Thread Christian Balzer

Hello,

your subject is misleading, as this is not really related to Deis/Docker.

Find the very recent "Is mon initial members used after the first quorum?"
thread in this ML.

In short, list all your 3 mons in the initial members section.

And yes, rebooting things all at the same time can be "fun", I managed to
get into a similar situation like yours once (though that cluster really
had only one mon) and it took a hard reset to fix things eventually.

Christian

On Tue, 16 Dec 2014 08:52:15 +0800 Jimmy Chu wrote:

> Hi,
> 
> I installed ceph on 3 nodes, having one monitor, and one OSD running on 
> each node. After rebooting them all at once (I see this may be a bad 
> move now), the ceph monitors refuse to connect to each other.
> 
> When I run:
> 
> ceph mon getmap -o /etc/ceph/monmap
> 
> or even
> 
> ceph -s
> 
> It only shows the following:
> 
> Dec 14 16:38:44 deis-1 sh[933]: 2014-12-14 08:38:44.265419 7f5cec71f700  
> 0 -- :/121 >> 10.132.183.191:6789/0 pipe(0x7f5ce40296a0 sd=4 :0 s=1 
> pgs=0 cs=0 l=1
> c=0x7f5ce4029930).fault
> Dec 14 16:38:44 deis-1 sh[933]: 2014-12-14 08:38:44.265419 7f5cec71f700  
> 0 -- :/121 >> 10.132.183.192:6789/0 pipe(0x7f5ce40296a0 sd=4 :0 s=1 
> pgs=0 cs=0 l=1
> c=0x7f5ce4029930).fault
> Dec 14 16:38:50 deis-1 sh[933]: 2014-12-14 08:38:50.267398 7f5cec71f700  
> 0 -- :/121 >> 10.132.183.190:6789/0 pipe(0x7f5cd40030e0 sd=4 :0 s=1 
> pgs=0 cs=0 l=1
> c=0x7f5cd4003370).fault
> ...keep repeating...
> 
> So, there is no quorum formed, and ceph admin socket file is not there 
> for connection. What should be my next step to recover the storage?
> 
> This is *my /etc/ceph/ceph.conf file*:
> [global]
> fsid = cc368515-9dc6-48e2-9526-58ac4cbb3ec9
> mon initial members = deis-3
> auth cluster required = cephx
> auth service required = cephx
> auth client required = cephx
> osd pool default size = 3
> osd pool default min_size = 1
> osd pool default pg_num = 128
> osd pool default pgp_num = 128
> osd recovery delay start = 15
> log file = /dev/stdout
> 
> [mon.deis-3]
> host = deis-3
> mon addr = 10.132.183.190:6789
> 
> [mon.deis-1]
> host = deis-1
> mon addr = 10.132.183.191:6789
> 
> [mon.deis-2]
> host = deis-2
> mon addr = 10.132.183.192:6789
> 
> [client.radosgw.gateway]
> host = deis-store-gateway
> keyring = /etc/ceph/ceph.client.radosgw.keyring
> rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
> log file = /dev/stdout
> 
> Thank you.
> 
> - Jimmy Chu


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD and HA KVM anybody?

2014-12-15 Thread Christian Balzer

Hello,

On Mon, 15 Dec 2014 09:23:23 +0100 Josef Johansson wrote:

> Hi Christian,
> 
> We’re using Proxmox that has support for HA, they do it per-vm.
> We’re doing it manually right now though, because we like it :). 
> 
> When I looked at it I couldn’t see a way of just allowing a set of hosts
> in the HA (i.e. not the storage nodes), but that’s probably easy to
> solve.
>

Ah, Proxmox. I test drove this about a year ago and while it has some nice
features the "black box" approach of taking over bare metal hardware and
the ancient kernel doesn't mesh with other needs I have here.

Thanks for reminding me, though.

Christian
 
> Cheers,
> Josef
> 
> > On 15 Dec 2014, at 04:10, Christian Balzer  wrote:
> > 
> > 
> > Hello,
> > 
> > What are people here using to provide HA KVMs (and with that I mean
> > automatic, fast VM failover in case of host node failure) in with RBD
> > images?
> > 
> > Openstack and ganeti have decent Ceph/RBD support, but no HA (plans
> > aplenty though).
> > 
> > I have plenty of experience with Pacemaker (DRBD backed) but there is
> > only an unofficial RBD resource agent for it, which also only supports
> > kernel based RBD. 
> > And while Pacemaker works great, it scales like leaden porcupines,
> > things degrade rapidly after 20 or so instances.
> > 
> > So what are other people here using to keep their KVM based VMs up and
> > running all the time?
> > 
> > Regards,
> > 
> > Christian
> > -- 
> > Christian BalzerNetwork/Systems Engineer
> > ch...@gol.com   Global OnLine Japan/Fusion Communications
> > http://www.gol.com/
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Test 3

2014-12-15 Thread Lindsay Mathieson
Last one, sorry
-- 
Lindsay Mathieson | Senior Developer 
Softlog Australia 
43 Kedron Park Road, Wooloowin, QLD, 4030
[T] +61 7 3632 8804 | [F] +61 1800-818-914| [W] softlog.com.au


DISCLAIMER: This Email and any attachments are a confidential communication 
intended exclusively for the recipient. If you are not the intended recipient 
you must not disclose or use any of the contents of this Email. Should you 
receive this Email in error, contact us immediately by return Email and delete 
this Email and any attachments. If you are the intended recipient of this 
Email and propose to rely on its contents you should contact the writer to 
confirm the same. Copyright and privilege relating to the contents of this 
Email and any attachments are reserved. It is the recipient’s responsibility 
to scan all attachments for viruses prior to use. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd snapshot slow restore

2014-12-15 Thread Lindsay Mathieson
I'm finding snapshot restores to be very slow. With a small vm, I can
take a snapshot withing seconds, but restores can take over 15
minutes, sometimes nearly an hou, depending on how I have tweaked
ceph.

The same vm as a QCOW2 image on NFS or native disk can be restored in
under 30 seconds.

Is this normal? is ceph just really slow at restoring rbd snapshots,
or have I really borked my setup? :)

Very basic setup:
- 3 Monitors
- 2 OSD's, ZFS on  (WD 3TB Red). Not fast disks
- 2 !0GB SSD Journals


-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Test 6

2014-12-15 Thread Lindsay Mathieson
-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple issues :( Ubuntu 14.04, latest Ceph

2014-12-15 Thread Benjamin
Aha, excellent suggestion! I'll try that as soon as I get back, thank you.
- B
On Dec 15, 2014 5:06 PM, "Craig Lewis"  wrote:

>
> On Sun, Dec 14, 2014 at 6:31 PM, Benjamin  wrote:
>>
>> The machines each have Ubuntu 14.04 64-bit, with 1GB of RAM and 8GB of
>> disk. They have between 10% and 30% disk utilization but common between all
>> of them is that they *have free disk space* meaning I have no idea what
>> the heck is causing Ceph to complain.
>>
>
> Each OSD is 8GB?  You need to make them at least 10 GB.
>
> Ceph weights each disk as it's size in TiB, and it truncates to two
> decimal places.  So your 8 GiB disks have a weight of 0.00.  Bump it up to
> 10 GiB, and it'll get a weight of 0.01.
>
> You should have 3 OSDs, one for each of ceph0,ceph1,ceph2.
>
> If that doesn't fix the problem, go ahead and post the things Udo
> mentioned.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple issues :( Ubuntu 14.04, latest Ceph

2014-12-15 Thread Craig Lewis
On Sun, Dec 14, 2014 at 6:31 PM, Benjamin  wrote:
>
> The machines each have Ubuntu 14.04 64-bit, with 1GB of RAM and 8GB of
> disk. They have between 10% and 30% disk utilization but common between all
> of them is that they *have free disk space* meaning I have no idea what
> the heck is causing Ceph to complain.
>

Each OSD is 8GB?  You need to make them at least 10 GB.

Ceph weights each disk as it's size in TiB, and it truncates to two decimal
places.  So your 8 GiB disks have a weight of 0.00.  Bump it up to 10 GiB,
and it'll get a weight of 0.01.

You should have 3 OSDs, one for each of ceph0,ceph1,ceph2.

If that doesn't fix the problem, go ahead and post the things Udo mentioned.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Number of SSD for OSD journal

2014-12-15 Thread Craig Lewis
I was going with a low perf scenario, and I still ended up adding SSDs.
Everything was fine in my 3 node cluster, until I wanted to add more nodes.


Admittedly, I was a bit aggressive with the expansion.  I added a whole
node at once, rather than one or two disks at a time.  Still, I wasn't
expecting the average RadosGW latency to go from 0.1 seconds to 10
seconds.  With the SSDs, I can do the same thing, and latency only goes up
to 1 seconds.

I'll be adding the Intel DC S3700's too all my nodes.


On Mon, Dec 15, 2014 at 12:45 PM, Sebastien Han 
wrote:

> If you’re going with a low perf scenario I don’t think you should bother
> buying SSD, just remove them from the picture and do 12 SATA 7.2K 4TB.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Dual RADOSGW Network

2014-12-15 Thread Craig Lewis
That shouldn't be a problem.  Just have Apache bind to all interfaces
instead of the external IP.

In my case, I only have Apache bound to the internal interface.  My load
balancer has an external and internal IP, and I'm able to talk to it on
both interfaces.

On Mon, Dec 15, 2014 at 2:00 PM, Georgios Dimitrakakis  wrote:
>
> Hi all!
>
> I have a single CEPH node which has two network interfaces.
>
> One is configured to be accessed directly by the internet (153.*) and the
> other one is configured on an internal LAN (192.*)
>
> For the moment radosgw is listening on the external (internet) interface.
>
> Can I configure radosgw to be accessed by both interfaces? What I would
> like to do is to save bandwidth and time for the machines on the internal
> network and use the internal net for all rados communications.
>
>
> Any ideas?
>
>
> Best regards,
>
>
> George
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Running ceph in Deis/Docker

2014-12-15 Thread Jimmy Chu

Hi,

I installed ceph on 3 nodes, having one monitor, and one OSD running on 
each node. After rebooting them all at once (I see this may be a bad 
move now), the ceph monitors refuse to connect to each other.


When I run:

ceph mon getmap -o /etc/ceph/monmap

or even

ceph -s

It only shows the following:

Dec 14 16:38:44 deis-1 sh[933]: 2014-12-14 08:38:44.265419 7f5cec71f700  
0 -- :/121 >> 10.132.183.191:6789/0 pipe(0x7f5ce40296a0 sd=4 :0 s=1 
pgs=0 cs=0 l=1

c=0x7f5ce4029930).fault
Dec 14 16:38:44 deis-1 sh[933]: 2014-12-14 08:38:44.265419 7f5cec71f700  
0 -- :/121 >> 10.132.183.192:6789/0 pipe(0x7f5ce40296a0 sd=4 :0 s=1 
pgs=0 cs=0 l=1

c=0x7f5ce4029930).fault
Dec 14 16:38:50 deis-1 sh[933]: 2014-12-14 08:38:50.267398 7f5cec71f700  
0 -- :/121 >> 10.132.183.190:6789/0 pipe(0x7f5cd40030e0 sd=4 :0 s=1 
pgs=0 cs=0 l=1

c=0x7f5cd4003370).fault
...keep repeating...

So, there is no quorum formed, and ceph admin socket file is not there 
for connection. What should be my next step to recover the storage?


This is *my /etc/ceph/ceph.conf file*:
[global]
fsid = cc368515-9dc6-48e2-9526-58ac4cbb3ec9
mon initial members = deis-3
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd pool default size = 3
osd pool default min_size = 1
osd pool default pg_num = 128
osd pool default pgp_num = 128
osd recovery delay start = 15
log file = /dev/stdout

[mon.deis-3]
host = deis-3
mon addr = 10.132.183.190:6789

[mon.deis-1]
host = deis-1
mon addr = 10.132.183.191:6789

[mon.deis-2]
host = deis-2
mon addr = 10.132.183.192:6789

[client.radosgw.gateway]
host = deis-store-gateway
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
log file = /dev/stdout

Thank you.

- Jimmy Chu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Very last test

2014-12-15 Thread Lindsay Mathieson

-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Test 3

2014-12-15 Thread Lindsay Mathieson
Last one, sorry

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Test 2 - plain, unsigned

2014-12-15 Thread Lindsay Mathieson
Test Msg, at request of list owner
-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Test 1 - html, signed

2014-12-15 Thread Lindsay Mathieson
Test Msg, at request of list owner
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] tgt / rbd performance

2014-12-15 Thread Mike Christie
On 12/13/2014 09:39 AM, Jake Young wrote:
> On Friday, December 12, 2014, Mike Christie  > wrote:
> 
> On 12/11/2014 11:39 AM, ano nym wrote:
> >
> > there is a ceph pool on a hp dl360g5 with 25 sas 10k (sda-sdy) on a
> > msa70 which gives me about 600 MB/s continous write speed with rados
> > write bench. tgt on the server with rbd backend uses this pool.
> mounting
> > local(host) with iscsiadm, sdz is the virtual iscsi device. As you can
> > see, sdz max out with 100%util at ~55MB/s when writing to it.
> >
> > I know that tgt-rbd is more a proof-of-concept then production-ready.
> >
> > Anyway, is someone using it and/or are there any hints to speed it up?
> >
> 
> Increasing the tgt nr_threads setting helps. Try 64 or 128.
> 
> 
> Do you just add this to the targets.conf?
> 
> nr_threads 128
> 
>

I was just starting tgtd manually with the --nr_iothreads setting (see
tgtd --help to see if your version supports it). I do not know if the
targets.conf parser supports that setting.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Hi.. s3cmd unable to create buckets

2014-12-15 Thread Ruchika Kharwar
I found something interesting.
On the S3 client in the .s3cfg I made these changes:


host_base = 100.100.0.20i.e. IP address of the radsogw server
host_base = cephadmin.com

In the /etc/dnsmasq.conf on the same client I added these lines

address=/cephadmin.com/100.100.0.20
listen-address=127.0.0.1

nslookup cephadmin.com

Server: 127.0.0.1
Address: 127.0.0.1#53

Name: cephadmin.com
Address: 10.10.0.200


On the machine having the radosgw, /etc/ceph/ceph.conf contains these lines

rgw dns name = cephadmin.com
rgw resolve name = cephadmin.com


And s3cmd ls and s3cmd mb s3://bucketX WORKS.

BUT if someone could help me understand why replacing this in the .s3cfg
does not allow s3cmd ls to work (but bucket creationcontinues to work) I
will be very grateful.

Why can I not use the below line and expect s3cmd ls to work ?
host_base = cephadmin.com

Thank you




On Mon, Dec 15, 2014 at 9:19 AM, Ruchika Kharwar 
wrote:
>
>
>
> On Mon, Dec 15, 2014 at 4:22 AM, Luis Periquito 
> wrote:
>>
>> Have you created the * DNS record?
>> [RK] I can ping to the IP addresses
>> bucket1. needs to resolve to that IP address (that's what
>> you're saying in the host_bucket directive).
>> [RK] I have rgw dns name in the /etc/ceph/ceph.conf file set to
>> 100.100.0.20.com
>>
>
> I set the host_bucket to %(buckets)s.100.100.0.20.com.
>
> I added these lines to the /etc/dnsmasq.conf
> address=/*.100.100.0.20.com/100.100.0.20
> address=/bucket1.100.100.0.20.com/100.100.0.20
> listen-address=127.0.0.1
>
> What should the DNS record look like and where do I add it ?
> Thank you
>
>
>
>
>>
>>
>
>
>> On Mon, Dec 15, 2014 at 5:52 AM, Ruchika Kharwar 
>> wrote:
>>
>>> Apologies for re-asking this question since I found several hits on this
>>> question but not very clear answers.
>>>
>>> I am in a situation where s3cmd ls seems to work
>>> but s3cmd mb s3://bucket1 does not
>>>
>>> 1. The rgw dns name  = servername in the apache rados.vhost.conf file.
>>> and on the client running the s3cmd
>>> the .s3cfg has
>>> host_base = 
>>> host_bucket = %(bucket)s.
>>>
>>> the rgs dns name and that in the apache2 rados.vhost.conf has rgw dns
>>> name and Servername set to 
>>>
>>> Please advise
>>>
>>> Thank you
>>>
>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] AWS SDK and MultiPart Problem

2014-12-15 Thread Georgios Dimitrakakis

Thx a lot Yehuda!

This one with tilde seems to be working!

Fingers crossed that it will continue in the future :-)


Warmest regards,


George


In any case, I pushed earlier today another fix to the same branch
that replaces the slash with a tilde. Let me know if that one works
for you.

Thanks,
Yehuda

On Fri, Dec 12, 2014 at 5:59 AM, Georgios Dimitrakakis
 wrote:

How silly of me!!!

I 've just noticed that the file isn't writable by the apache!


I 'll be back with the logs...


G.




I 'd be more than happy to provide to you all the info but for some
unknown reason my radosgw.log is empty.

This is the part that I have in ceph.conf

[client.radosgw.gateway]
host = xxx
keyring = /etc/ceph/keyring.radosgw.gateway
rgw socket path = /tmp/radosgw.sock
rgw dns name = xxx.example.com
rgw enable usage log = true
rgw usage log tick interval = 30
rgw usage log flush threshold = 1024
rgw usage max shards = 32
rgw usage max user shards = 1
log file = /var/log/ceph/radosgw.log
debug ms = 1
debug rgw = 20



but no matter what I put in there the log is empty

$ pwd
/var/log/ceph
$ ls -l radosgw.log
-rw-r--r-- 1 root root 0 Nov 30 03:01 radosgw.log


I have already started  another thread with title "Empty Rados log"
here in ceph-users list since December 4th but haven't heard from
anyone yet...

If I solve this I will be able to provide you with all the data.


Regards,


George


Ok, I've been digging a bit more. I don't have full radosgw logs 
for
the issue, so if you could provide it (debug rgw = 20), it might 
help.
However, as it is now, I think the issue is with the way the 
client

library is signing the requests. Instead of using the undecoded
uploadId, it uses the encoded version for the signature, which 
doesn't

sign correctly. The same would have happened if it would have run
against amazon S3 (just tested it).
The two solutions that I see are to fix the client library, and/or 
to
modify the character to one that does not require escaping. Sadly 
the

dash character that you were using cannot be used safely in that
context. Maybe tilde ('~') would could work.

Yehuda

On Fri, Dec 12, 2014 at 2:41 AM, Georgios Dimitrakakis
 wrote:


Dear Yehuda,

I have installed the patched version as you can see:

$ radosgw --version
ceph version 0.80.7-1-gbd43759
(bd43759f6e76fa827e2534fa4e61547779ee10a5)

$ ceph --version
ceph version 0.80.7-1-gbd43759
(bd43759f6e76fa827e2534fa4e61547779ee10a5)

$ sudo yum info ceph-radosgw
Installed Packages
Name: ceph-radosgw
Arch: x86_64
Version : 0.80.7
Release : 1.gbd43759.el6
Size: 3.8 M
Repo: installed
From repo   : ceph-source
Summary : Rados REST gateway
URL : http://ceph.com/
License : GPL-2.0
Description : radosgw is an S3 HTTP REST gateway for the RADOS 
object

store.
It is
: implemented as a FastCGI module using libfcgi, and 
can be

used
in
: conjunction with any FastCGI capable web server.


Unfortunately the problem on the multipart upload with aws-sdk 
still

remains
the same!


Here is a part of the apache log:


"PUT



/clients-space/test/iip7.dmg?partNumber=3&uploadId=2%2F9rEUmdFcuW66VJfeH3_jbqqUz0jKvrO
HTTP/1.1" 403 78 "-" "aws-sdk-nodejs/2.1.0 darwin/v0.10.33"

"PUT



/clients-space/test/iip7.dmg?partNumber=1&uploadId=2%2F9rEUmdFcuW66VJfeH3_jbqqUz0jKvrO
HTTP/1.1" 403 78 "-" "aws-sdk-nodejs/2.1.0 darwin/v0.10.33"

"PUT



/clients-space/test/iip7.dmg?partNumber=2&uploadId=2%2F9rEUmdFcuW66VJfeH3_jbqqUz0jKvrO
HTTP/1.1" 403 78 "-" "aws-sdk-nodejs/2.1.0 darwin/v0.10.33"



Directly modification of the binary so that the "2%2F" be changed 
to

"2-"
results in success and here is the log:


"PUT



/clients-space/test/iip7.dmg?partNumber=1&uploadId=2-R6bxv4TM2Brxn-w9aHOcbb8OSJ3-Vh2
HTTP/1.1" 200 - "-" "aws-sdk-nodejs/2.1.0 darwin/v0.10.33"

"PUT



/clients-space/test/iip7.dmg?partNumber=2&uploadId=2-R6bxv4TM2Brxn-w9aHOcbb8OSJ3-Vh2
HTTP/1.1" 200 - "-" "aws-sdk-nodejs/2.1.0 darwin/v0.10.33"

"PUT



/clients-space/test/iip7.dmg?partNumber=4&uploadId=2-R6bxv4TM2Brxn-w9aHOcbb8OSJ3-Vh2
HTTP/1.1" 200 - "-" "aws-sdk-nodejs/2.1.0 darwin/v0.10.33"

"POST


/clients-space/test/iip7.dmg?uploadId=2-R6bxv4TM2Brxn-w9aHOcbb8OSJ3-Vh2
HTTP/1.1" 200 302 "-" "aws-sdk-nodejs/2.1.0 darwin/v0.10.33"




Can you think of something else??


Best regards,


George






OK! I will give it some time and will try again later!

Thanks a lot for your help!

Warmest regards,

George


The branch I pushed earlier was based off recent development 
branch. I

just pushed one based off firefly (wip-10271-firefly). It will
probably take a bit to build.

Yehuda

On Thu, Dec 11, 2014 at 12:03 PM, Georgios Dimitrakakis
 wrote:



Hi again!

I have installed and enabled the development branch 
repositories as

described here:





http://ceph.com/docs/master/install/get-packages/#add-ceph-development

and when I try to update the ceph-radosgw package I get the
following:

Installed Packages
Name: ceph-radosgw

[ceph-users] Dual RADOSGW Network

2014-12-15 Thread Georgios Dimitrakakis

Hi all!

I have a single CEPH node which has two network interfaces.

One is configured to be accessed directly by the internet (153.*) and 
the other one is configured on an internal LAN (192.*)


For the moment radosgw is listening on the external (internet) 
interface.


Can I configure radosgw to be accessed by both interfaces? What I would 
like to do is to save bandwidth and time for the machines on the 
internal network and use the internal net for all rados communications.



Any ideas?


Best regards,


George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Number of SSD for OSD journal

2014-12-15 Thread Florent MONTHEL
Thanks all

I will probably have 2x10gb : 1x10gb for client and 1x10gb for cluster but I 
take in charge your recommendation Sebastien

The 200GB SSD will probably give me around 500MB/s sequential bandwidth. So 
with only 2 SSD I can  overload 1x 10gb network.

Hum I will take care of osd density

Sent from my iPhone

> On 15 déc. 2014, at 21:45, Sebastien Han  wrote:
> 
> Salut,
> 
> The general recommended ratio (for me at least) is 3 journals per SSD. Using 
> 200GB Intel DC S3700 is great.
> If you’re going with a low perf scenario I don’t think you should bother 
> buying SSD, just remove them from the picture and do 12 SATA 7.2K 4TB.
> 
> For medium and medium ++ perf using a ratio 1:11 is way to high, the SSD will 
> definitely be the bottleneck here.
> Please also note that (bandwidth wise) with 22 drives you’re already hitting 
> the theoretical limit of a 10Gbps network. (~50MB/s * 22 ~= 1.1Gbps).
> You can theoretically up that value with LACP (depending on the 
> xmit_hash_policy you’re using of course).
> 
> Btw what’s the network? (since I’m only assuming here).
> 
> 
>> On 15 Dec 2014, at 20:44, Florent MONTHEL  wrote:
>> 
>> Hi,
>> 
>> I’m buying several servers to test CEPH and I would like to configure 
>> journal on SSD drives (maybe it’s not necessary for all use cases)
>> Could you help me to identify number of SSD I need (SSD are very expensive 
>> and GB price business case killer… ) ? I don’t want to experience SSD 
>> bottleneck (some abacus ?).
>> I think I will be with below CONF 2 & 3
>> 
>> 
>> CONF 1 DELL 730XC "Low Perf":
>> 10 SATA 7.2K 3.5  4TB + 2 SSD 2.5 » 200GB "intensive write"
>> 
>> CONF 2 DELL 730XC « Medium Perf" :
>> 22 SATA 7.2K 2.5 1TB + 2 SSD 2.5 » 200GB "intensive write"
>> 
>> CONF 3 DELL 730XC « Medium Perf ++" :
>> 22 SAS 10K 2.5 1TB + 2 SSD 2.5 » 200GB "intensive write"
>> 
>> Thanks
>> 
>> Florent Monthel
>> 
>> 
>> 
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> Cheers.
> 
> Sébastien Han
> Cloud Architect
> 
> "Always give 100%. Unless you're giving blood."
> 
> Phone: +33 (0)1 49 70 99 72
> Mail: sebastien@enovance.com
> Address : 11 bis, rue Roquépine - 75008 Paris
> Web : www.enovance.com - Twitter : @enovance
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Confusion about journals and caches

2014-12-15 Thread Max Power
> J-P Methot  hat am 15. Dezember 2014 um 16:05
> geschrieben:
> I must admit, I have a bit of difficulty understanding your diagram.

I had the illusion that a cache tier also has a journal but it has not. Sounds
less complex now.

But the XFS journals on the devices (as they are not 'inline') and the journals
of ceph's osds - aren't these two different things? So I could save the xfs
journal somewhere on the ssd (mkfs.xfs -l logdev=/dev/ssdX ...) and the journal
of ceph also (osd journal = /dev/ssdY). But does it make sense to do so? Or is
it better to leave the XFS journal inline?

> For the drive journaling, it is common to put most OSD journal on one
> SSD for such a small setup. Just remember to partition your SSD so that
> there is one partition for each OSD. This is an error that caused me a
> lot of trouble.

Ah okay, this saves me some wasted time.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Number of SSD for OSD journal

2014-12-15 Thread Sebastien Han
Salut,

The general recommended ratio (for me at least) is 3 journals per SSD. Using 
200GB Intel DC S3700 is great.
If you’re going with a low perf scenario I don’t think you should bother buying 
SSD, just remove them from the picture and do 12 SATA 7.2K 4TB.

For medium and medium ++ perf using a ratio 1:11 is way to high, the SSD will 
definitely be the bottleneck here.
Please also note that (bandwidth wise) with 22 drives you’re already hitting 
the theoretical limit of a 10Gbps network. (~50MB/s * 22 ~= 1.1Gbps).
You can theoretically up that value with LACP (depending on the 
xmit_hash_policy you’re using of course).

Btw what’s the network? (since I’m only assuming here).


> On 15 Dec 2014, at 20:44, Florent MONTHEL  wrote:
> 
> Hi,
> 
> I’m buying several servers to test CEPH and I would like to configure journal 
> on SSD drives (maybe it’s not necessary for all use cases)
> Could you help me to identify number of SSD I need (SSD are very expensive 
> and GB price business case killer… ) ? I don’t want to experience SSD 
> bottleneck (some abacus ?).
> I think I will be with below CONF 2 & 3
> 
> 
> CONF 1 DELL 730XC "Low Perf":
> 10 SATA 7.2K 3.5  4TB + 2 SSD 2.5 » 200GB "intensive write"
> 
> CONF 2 DELL 730XC « Medium Perf" :
> 22 SATA 7.2K 2.5 1TB + 2 SSD 2.5 » 200GB "intensive write"
> 
> CONF 3 DELL 730XC « Medium Perf ++" :
> 22 SAS 10K 2.5 1TB + 2 SSD 2.5 » 200GB "intensive write"
> 
> Thanks
> 
> Florent Monthel
> 
> 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Cheers.

Sébastien Han
Cloud Architect

"Always give 100%. Unless you're giving blood."

Phone: +33 (0)1 49 70 99 72
Mail: sebastien@enovance.com
Address : 11 bis, rue Roquépine - 75008 Paris
Web : www.enovance.com - Twitter : @enovance



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Number of SSD for OSD journal

2014-12-15 Thread Nick Fisk
Hi Florent,

 

Journals don’t need to be very big, 5-10GB per OSD would normally be ample. The 
key is that you get a SSD with high write endurance, this makes the Intel S3700 
drives perfect for journal use.

 

In terms of how many OSD’s you can run per SSD, depends purely on how important 
performance is to you and to a certain extent if you need high sequential 
bandwidth or high IO throughput.

 

The S3700 SSD will probably give you around 500MB/s sequential bandwidth, which 
means to get best performance, you wouldn’t want to run more than 5-6 7.2K 
disks per SSD, as then the SSD would be the bottleneck. If however your 
workload is mainly random IO, the SSD’s will likely support more disks.

 

But keep in mind that if a SSD fails, then all the OSD’s that it was a journal 
for will also be lost.

 

If you use 2 SSD’s with a 10-20GB partition in RAID1 for the OS, then you might 
find you can use the remaining space on the OSD’s (not RAID) split between the 
22 disks. Of course keep in mind what I said about IO type and performance.

 

Hope that helps, please let me know if you have any other questions.

 

Nick

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Florent MONTHEL
Sent: 15 December 2014 19:45
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Number of SSD for OSD journal

 

Hi,

 

I’m buying several servers to test CEPH and I would like to configure journal 
on SSD drives (maybe it’s not necessary for all use cases)

Could you help me to identify number of SSD I need (SSD are very expensive and 
GB price business case killer… ) ? I don’t want to experience SSD bottleneck 
(some abacus ?).

I think I will be with below CONF 2 & 3

 

 

CONF 1 DELL 730XC "Low Perf":

10 SATA 7.2K 3.5  4TB + 2 SSD 2.5 » 200GB "intensive write"

 

CONF 2 DELL 730XC « Medium Perf" :

22 SATA 7.2K 2.5 1TB + 2 SSD 2.5 » 200GB "intensive write"

 

CONF 3 DELL 730XC « Medium Perf ++" :

22 SAS 10K 2.5 1TB + 2 SSD 2.5 » 200GB "intensive write"

 

Thanks

 

Florent Monthel

 

 

 

 




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Number of SSD for OSD journal

2014-12-15 Thread Florent MONTHEL
Hi,

I’m buying several servers to test CEPH and I would like to configure journal 
on SSD drives (maybe it’s not necessary for all use cases)
Could you help me to identify number of SSD I need (SSD are very expensive and 
GB price business case killer… ) ? I don’t want to experience SSD bottleneck 
(some abacus ?).
I think I will be with below CONF 2 & 3


CONF 1 DELL 730XC "Low Perf":
10 SATA 7.2K 3.5  4TB + 2 SSD 2.5 » 200GB "intensive write"

CONF 2 DELL 730XC « Medium Perf" :
22 SATA 7.2K 2.5 1TB + 2 SSD 2.5 » 200GB "intensive write"

CONF 3 DELL 730XC « Medium Perf ++" :
22 SAS 10K 2.5 1TB + 2 SSD 2.5 » 200GB "intensive write"

Thanks

Florent Monthel





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Proper procedure for osd/host removal

2014-12-15 Thread Dinu Vlad
Thanks - I was suspecting it. I was thinking at a course of action that would 
allow setting the weight of an entire host to zero in the crush map - thus 
forcing the migration of the data out of the OSDs of that host, followed by the 
crush and osd removal, one by one (hopefully this time without another backfill 
session).  

Problem is I don't have where to test how that would work and/or what would be 
the side-effects (if any). 


On 15 Dec 2014, at 21:07, Adeel Nazir  wrote:

> I'm going through something similar, and it seems like the double backfill 
> you're experiencing is about par for the course. According to the CERN 
> presentation (http://www.slideshare.net/Inktank_Ceph/scaling-ceph-at-cern 
> slide 19), doing a 'ceph osd crush rm osd ' should save the double 
> backfill, but I haven't experienced that in my 0.80.5 cluster. Even after I 
> do a crush rm osd, and finally remove it via ceph rm osd., it computes a 
> new map and does the backfill again. As far as I can tell, there's no way 
> around it without editing the map manually, making whatever changes you 
> require and then pushing the new map. I personally am not experienced enough 
> to feel comfortable making that kind of a change.
> 
> 
> Adeel
> 
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Dinu Vlad
>> Sent: Monday, December 15, 2014 11:35 AM
>> To: ceph-users@lists.ceph.com
>> Subject: [ceph-users] Proper procedure for osd/host removal
>> 
>> Hello,
>> 
>> I've been working to upgrade the hardware on a semi-production ceph
>> cluster, following the instructions for OSD removal from
>> http://ceph.com/docs/master/rados/operations/add-or-rm-
>> osds/#removing-osds-manual. Basically, I've added the new hosts to the
>> cluster and now I'm removing the old ones from it.
>> 
>> What I found curious is that after the sync triggered by the "ceph osd out
>> " finishes and I stop the osd process and remove it from the crush map,
>> another session of synchronization is triggered - sometimes this one takes
>> longer than the first. Also, removing an empty "host" bucket from the crush
>> map triggred another resynchronization.
>> 
>> I noticed that the overall weight of the host bucket does not change in the
>> crush map as a result of one OSD being "out", therefore what is happening is
>> kinda' normal behavior - however it remains time-consuming. Is there
>> something that can be done to avoid the double resync?
>> 
>> I'm running 0.72.2 on top of ubuntu 12.04 on the OSD hosts.
>> 
>> Thanks,
>> Dinu
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Proper procedure for osd/host removal

2014-12-15 Thread Adeel Nazir
I'm going through something similar, and it seems like the double backfill 
you're experiencing is about par for the course. According to the CERN 
presentation (http://www.slideshare.net/Inktank_Ceph/scaling-ceph-at-cern slide 
19), doing a 'ceph osd crush rm osd ' should save the double backfill, but 
I haven't experienced that in my 0.80.5 cluster. Even after I do a crush rm 
osd, and finally remove it via ceph rm osd., it computes a new map and does 
the backfill again. As far as I can tell, there's no way around it without 
editing the map manually, making whatever changes you require and then pushing 
the new map. I personally am not experienced enough to feel comfortable making 
that kind of a change.


Adeel

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Dinu Vlad
> Sent: Monday, December 15, 2014 11:35 AM
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] Proper procedure for osd/host removal
> 
> Hello,
> 
> I've been working to upgrade the hardware on a semi-production ceph
> cluster, following the instructions for OSD removal from
> http://ceph.com/docs/master/rados/operations/add-or-rm-
> osds/#removing-osds-manual. Basically, I've added the new hosts to the
> cluster and now I'm removing the old ones from it.
> 
> What I found curious is that after the sync triggered by the "ceph osd out
> " finishes and I stop the osd process and remove it from the crush map,
> another session of synchronization is triggered - sometimes this one takes
> longer than the first. Also, removing an empty "host" bucket from the crush
> map triggred another resynchronization.
> 
> I noticed that the overall weight of the host bucket does not change in the
> crush map as a result of one OSD being "out", therefore what is happening is
> kinda' normal behavior - however it remains time-consuming. Is there
> something that can be done to avoid the double resync?
> 
> I'm running 0.72.2 on top of ubuntu 12.04 on the OSD hosts.
> 
> Thanks,
> Dinu
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple issues :( Ubuntu 14.04, latest Ceph

2014-12-15 Thread Udo Lembke
Hi Benjamin,
On 15.12.2014 03:31, Benjamin wrote:
> Hey there,
>
> I've set up a small VirtualBox cluster of Ceph VMs. I have one
> "ceph-admin0" node, and three "ceph0,ceph1,ceph2" nodes for a total of 4.
>
> I've been following this
> guide: http://ceph.com/docs/master/start/quick-ceph-deploy/ to the letter.
>
> At the end of the guide, it calls for you to run "ceph health"... this
> is what happens when I do.
>
> "HEALTH_ERR 64 pgs stale; 64 pgs stuck stale; 2 full osd(s); 2/2 in
> osds are down"
hmm, why you have two OSDs only with tree nodes?

Can you post the output of following commands
ceph health detail
ceph osd tree
rados df
ceph osd pool get data size
ceph osd pool get rbd size
df -h # on all OSD-nodes

/etc/init.d/ceph start osd.0  # on node with osd.0
/etc/init.d/ceph start osd.1  # on node with osd.1


Udo


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw-Agent

2014-12-15 Thread lakshmi k s
Thanks Yehuda. But the link seems to be pointing to Debian binaries. Can you 
please point me to source packages?
Regards,Lakshmi.

 

 On Monday, December 15, 2014 8:16 AM, Yehuda Sadeh  
wrote:
   

 There's the 'radosgw-agent' package for debian, e.g., here:
http://ceph.com/debian-giant/pool/main/r/radosgw-agent/radosgw-agent_1.2-1~bpo70+1_all.deb

On Mon, Dec 15, 2014 at 5:12 AM, lakshmi k s  wrote:
> Hello -
>
> Can anyone help me locate the Debian-type source packages for radosgw-agent?
>
> Thanks,
> Lakshmi.
>
>
> On Monday, December 8, 2014 6:10 AM, lakshmi k s  wrote:
>
>
> Hello Sage -
>
> Just wondering if you are the module owner for radosgw-agent? If so, can you
> please help me to locate the latest source bits for debian wheezy?
>
> Thanks,
> Lakshmi.
>
>
> On Wednesday, December 3, 2014 8:42 PM, lakshmi k s 
> wrote:
>
>
> Hello - Please help me here. Where I can locate the source package?
>
>
> On Tuesday, December 2, 2014 12:41 PM, lakshmi k s  wrote:
>
>
> Hello:
>
> I am trying to locate the source package used for Debian Wheezy for the
> radosgw-agent 1.2-1-bpo70+1 that is available from the ceph repository.
>
> Our company requires us to verify package builds from source and to check
> licenses from those same source packages. However I have not been able to
> locate the source package for the 1.2-1~bpo70+1 version that is available as
> a pre-built package for debian wheezy from the current ceph software
> repository.
>
> Can anyone tell me where the repo is that I can put into my sources.list so
> I can pull this down to do our required verification steps?
>
> Thank you.
> Lakshmi.
>
>
>
>
>
>
>


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Proper procedure for osd/host removal

2014-12-15 Thread Dinu Vlad
Hello,

I've been working to upgrade the hardware on a semi-production ceph cluster, 
following the instructions for OSD removal from 
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual.
 Basically, I've added the new hosts to the cluster and now I'm removing the 
old ones from it. 

What I found curious is that after the sync triggered by the "ceph osd out 
" finishes and I stop the osd process and remove it from the crush map, 
another session of synchronization is triggered - sometimes this one takes 
longer than the first. Also, removing an empty "host" bucket from the crush map 
triggred another resynchronization. 

I noticed that the overall weight of the host bucket does not change in the 
crush map as a result of one OSD being "out", therefore what is happening is 
kinda' normal behavior - however it remains time-consuming. Is there something 
that can be done to avoid the double resync?

I'm running 0.72.2 on top of ubuntu 12.04 on the OSD hosts. 

Thanks,
Dinu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] IO Hang on rbd

2014-12-15 Thread Ilya Dryomov
On Mon, Dec 15, 2014 at 7:05 PM, reistlin87 <79026480...@yandex.ru> wrote:
> No, in dmesg is nothing about hangs

Not necessarily about hangs.  "socket closed" messages?  Can you
pastebin the entire kernel log for me?

> Here is the versions of software:
> root@ceph-esx-conv03-001:~# uname -a
> Linux ceph-esx-conv03-001 3.17.0-ceph #1 SMP Sun Oct 5 19:47:51 UTC 2014 
> x86_64 x86_64 x86_64 GNU/Linux

Which kernel are you running on the client box?  3.17 or 3.18?
If 3.17, can you try 3.18?

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw-Agent

2014-12-15 Thread Yehuda Sadeh
There's the 'radosgw-agent' package for debian, e.g., here:
http://ceph.com/debian-giant/pool/main/r/radosgw-agent/radosgw-agent_1.2-1~bpo70+1_all.deb

On Mon, Dec 15, 2014 at 5:12 AM, lakshmi k s  wrote:
> Hello -
>
> Can anyone help me locate the Debian-type source packages for radosgw-agent?
>
> Thanks,
> Lakshmi.
>
>
> On Monday, December 8, 2014 6:10 AM, lakshmi k s  wrote:
>
>
> Hello Sage -
>
> Just wondering if you are the module owner for radosgw-agent? If so, can you
> please help me to locate the latest source bits for debian wheezy?
>
> Thanks,
> Lakshmi.
>
>
> On Wednesday, December 3, 2014 8:42 PM, lakshmi k s 
> wrote:
>
>
> Hello - Please help me here. Where I can locate the source package?
>
>
> On Tuesday, December 2, 2014 12:41 PM, lakshmi k s  wrote:
>
>
> Hello:
>
> I am trying to locate the source package used for Debian Wheezy for the
> radosgw-agent 1.2-1-bpo70+1 that is available from the ceph repository.
>
> Our company requires us to verify package builds from source and to check
> licenses from those same source packages. However I have not been able to
> locate the source package for the 1.2-1~bpo70+1 version that is available as
> a pre-built package for debian wheezy from the current ceph software
> repository.
>
> Can anyone tell me where the repo is that I can put into my sources.list so
> I can pull this down to do our required verification steps?
>
> Thank you.
> Lakshmi.
>
>
>
>
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] IO Hang on rbd

2014-12-15 Thread reistlin87
No, in dmesg is nothing about hangs
Here is the versions of software:
root@ceph-esx-conv03-001:~# uname -a
Linux ceph-esx-conv03-001 3.17.0-ceph #1 SMP Sun Oct 5 19:47:51 UTC 2014 x86_64 
x86_64 x86_64 GNU/Linux
root@ceph-esx-conv03-001:~# ceph --version
ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)


15.12.2014, 16:20, "Ilya Dryomov" :
> On Thu, Dec 11, 2014 at 7:57 PM, reistlin87 <79026480...@yandex.ru> wrote:
>>  Hi all!
>>
>>  We have an annoying problem - when we launch intensive reading with rbd, 
>> the client, to which mounted image, hangs in this state:
>>
>>  Device: rrqm/s   wrqm/s r/s w/s    rMB/s    wMB/s avgrq-sz 
>> avgqu-sz   await r_await w_await  svctm  %util
>>  sda   0.00 0.00    0.00    1.20 0.00 0.00 8.00  
>>    0.00    0.00    0.00    0.00   0.00   0.00
>>  dm-0  0.00 0.00    0.00    1.20 0.00 0.00 8.00  
>>    0.00    0.00    0.00    0.00   0.00   0.00
>>  dm-1  0.00 0.00    0.00    0.00 0.00 0.00 0.00  
>>    0.00    0.00    0.00    0.00   0.00   0.00
>>  rbd0  0.00 0.00    0.00    0.00 0.00 0.00 0.00  
>>   32.00    0.00    0.00    0.00   0.00 100.00
>>
>>  Only  reboot helps. The logs are clean.
>>
>>  The fastest way to get hang it is run fio read with block size 512K, 4K  
>> usually works fine. But client may hang without fio - only because of heavy 
>> load.
>>
>>  We used different versions of the linux kernel and ceph - now on OSD and 
>> MONS we use ceph 0.87-1 and linux kernel 3.18. On the clients we have tried 
>> the latest versions from here http://gitbuilder.ceph.com/. , for example 
>> Ceph  0.87-68. Through libvirt everything works fine - we also  use  KVM  
>> and stgt (but stgs is slow)
>
> Is there anything in dmesg around the time it hangs?
>
> If possible, don't change anything about your config - number of osds,
> number of pgs, pools, etc so you can reproduce with logging enabled.
>
> Thanks,
>
> Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] IO Hang on rbd

2014-12-15 Thread reistlin87
We tried default configuration without additional parameters, but it still hangs
How can  we  see a OSD queue?

15.12.2014, 16:11, "Tomasz Kuzemko" :
> Try lowering "filestore max sync interval" and "filestore min sync
> interval". It looks like during the hanged period data is flushed from
> some overly big buffer.
>
> If this does not help you can monitor perf stats on OSDs to see if some
> queue is unusually large.
>
> --
> Tomasz Kuzemko
> tomasz.kuze...@ovh.net
>
> On Thu, Dec 11, 2014 at 07:57:48PM +0300, reistlin87 wrote:
>>  Hi all!
>>
>>  We have an annoying problem - when we launch intensive reading with rbd, 
>> the client, to which mounted image, hangs in this state:
>>
>>  Device: rrqm/s   wrqm/s r/s w/s    rMB/s    wMB/s avgrq-sz 
>> avgqu-sz   await r_await w_await  svctm  %util
>>  sda   0.00 0.00    0.00    1.20 0.00 0.00 8.00  
>>    0.00    0.00    0.00    0.00   0.00   0.00
>>  dm-0  0.00 0.00    0.00    1.20 0.00 0.00 8.00  
>>    0.00    0.00    0.00    0.00   0.00   0.00
>>  dm-1  0.00 0.00    0.00    0.00 0.00 0.00 0.00  
>>    0.00    0.00    0.00    0.00   0.00   0.00
>>  rbd0  0.00 0.00    0.00    0.00 0.00 0.00 0.00  
>>   32.00    0.00    0.00    0.00   0.00 100.00
>>
>>  Only  reboot helps. The logs are clean.
>>
>>  The fastest way to get hang it is run fio read with block size 512K, 4K  
>> usually works fine. But client may hang without fio - only because of heavy 
>> load.
>>
>>  We used different versions of the linux kernel and ceph - now on OSD and 
>> MONS we use ceph 0.87-1 and linux kernel 3.18. On the clients we have tried 
>> the latest versions from here http://gitbuilder.ceph.com/. , for example 
>> Ceph  0.87-68. Through libvirt everything works fine - we also  use  KVM  
>> and stgt (but stgs is slow)
>>
>>  Here is my config:
>>  [global]
>>  fsid = 566d9cab-793e-47e0-a0cd-e5da09f8037a
>>  mon_initial_members = 
>> srt-mon-001-02,amz-mon-001-000601,db24-mon-001-000105
>>  mon_host = 10.201.20.31,10.203.20.56,10.202.20.58
>>  auth_cluster_required = cephx
>>  auth_service_required = cephx
>>  auth_client_required = cephx
>>  filestore_xattr_use_omap = true
>>  public network = 10.201.20.0/22
>>  cluster network = 10.212.36.0/22
>>  osd crush update on start = false
>>  [mon]
>>  debug mon = 0
>>  debug paxos = 0/0
>>  debug auth = 0
>>
>>  [mon.srt-mon-001-02]
>>  host = srt-mon-001-02
>>  mon addr = 10.201.20.31:6789
>>  [mon.db24-mon-001-000105]
>>  host = db24-mon-001-000105
>>  mon addr = 10.202.20.58:6789
>>  [mon.amz-mon-001-000601]
>>  host = amz-mon-001-000601
>>  mon addr = 10.203.20.56:6789
>>  [osd]
>>  osd crush update on start = false
>>  osd mount options xfs = "rw,noatime,inode64,allocsize=4M"
>>  osd mkfs type = xfs
>>  osd mkfs options xfs = "-f -i size=2048"
>>  osd op threads = 20
>>  osd disk threads =8
>>  journal block align = true
>>  journal dio = true
>>  journal aio = true
>>  osd recovery max active = 1
>>  filestore max sync interval = 100
>>  filestore min sync interval = 10
>>  filestore queue max ops = 2000
>>  filestore queue max bytes = 536870912
>>  filestore queue committing max ops = 2000
>>  filestore queue committing max bytes = 536870912
>>  osd max backfills = 1
>>  osd client op priority = 63
>>  [osd.5]
>>  host = srt-osd-001-050204
>>  [osd.6]
>>  host = srt-osd-001-050204
>>  [osd.7]
>>  host = srt-osd-001-050204
>>  [osd.8]
>>  host = srt-osd-001-050204
>>  [osd.109]
>>  
>>  ___
>>  ceph-users mailing list
>>  ceph-users@lists.ceph.com
>>  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] giant initial install on RHEL 6.6 fails due to mon fauilure

2014-12-15 Thread Lukac, Erik
Hi Guys,

I am trying to install giant with puppet-cephdeploy but it fails at 
"ceph-deploy gatherkeys NODEs". There are no keys generated.
This is my output of cephdeploy:
  [ceph_deploy.mon][INFO  ] Running gatherkeys...
  [ceph_deploy.gatherkeys][DEBUG ] Checking ceph-203-1-public for 
/etc/ceph/ceph.client.admin.keyring
  [ceph-203-1-public][DEBUG ] connection detected need for sudo
  [ceph-203-1-public][DEBUG ] connected to host: ceph-203-1-public
  [ceph-203-1-public][DEBUG ] detect platform information from remote host
  [ceph-203-1-public][DEBUG ] detect machine type
  [ceph-203-1-public][DEBUG ] fetch remote file
  [ceph_deploy.gatherkeys][WARNING] Unable to find 
/etc/ceph/ceph.client.admin.keyring on ceph-203-1-public
  [ceph_deploy.gatherkeys][DEBUG ] Checking ceph-203-2-public for 
/etc/ceph/ceph.client.admin.keyring
  [ceph-203-2-public][DEBUG ] connection detected need for sudo
  [ceph-203-2-public][DEBUG ] connected to host: ceph-203-2-public
  [ceph-203-2-public][DEBUG ] detect platform information from remote host
  [ceph-203-2-public][DEBUG ] detect machine type
  [ceph-203-2-public][DEBUG ] fetch remote file
  [ceph_deploy.gatherkeys][WARNING] Unable to find 
/etc/ceph/ceph.client.admin.keyring on ceph-203-2-public
  [ceph_deploy.gatherkeys][DEBUG ] Checking ceph-203-7-public for 
/etc/ceph/ceph.client.admin.keyring
  [ceph-203-7-public][DEBUG ] connection detected need for sudo
  [ceph-203-7-public][DEBUG ] connected to host: ceph-203-7-public
  [ceph-203-7-public][DEBUG ] detect platform information from remote host
  [ceph-203-7-public][DEBUG ] detect machine type
  [ceph-203-7-public][DEBUG ] fetch remote file
  [ceph_deploy.gatherkeys][WARNING] Unable to find 
/etc/ceph/ceph.client.admin.keyring on ceph-203-7-public
  [ceph_deploy][ERROR ] KeyNotFoundError: Could not find keyring file: 
/etc/ceph/ceph.client.admin.keyring on host ceph-203-1-public, 
/etc/ceph/ceph.client.admin.keyring on host ceph-203-2-public, 
/etc/ceph/ceph.client.admin.keyring on host ceph-203-7-public

I also have the problem, that everytime I run ceph-deploy it does not terminate 
properly after having done everything successful.
Only setting "env CEPH_DEPLOY_TEST=YES" makes ceph-deploy terminate without an 
errormessage on stdout.

abbrtd first shows me this:
  Dec 15 15:28:22 ceph-203-1-public abrt: detected unhandled Python exception 
in '/usr/bin/ceph'
  Dec 15 15:28:22 ceph-203-1-public abrtd: New client connected
  Dec 15 15:28:22 ceph-203-1-public abrt-server[21512]: Saved Python crash dump 
of pid 21505 to /var/spool/abrt/pyhook-2014-12-15-15:28:22-21505
  Dec 15 15:28:22 ceph-203-1-public abrtd: Directory 
'pyhook-2014-12-15-15:28:22-21505' creation detected
  Dec 15 15:28:22 ceph-203-1-public abrtd: Package 'ceph-common' isn't signed 
with proper key
  Dec 15 15:28:22 ceph-203-1-public abrtd: 'post-create' on 
'/var/spool/abrt/pyhook-2014-12-15-15:28:22-21505' exited with 1
  Dec 15 15:28:22 ceph-203-1-public abrtd: Deleting problem directory 
'/var/spool/abrt/pyhook-2014-12-15-15:28:22-21505'

after a couple of wizardry abbrtd (backtrace) shows me this:
__init__.py:353:__init__:OSError: /usr/lib64/librados.so.2: undefined symbol: 
lttng_probe_register

Traceback (most recent call last):
  File "/usr/bin/ceph", line 862, in 
sys.exit(main())
  File "/usr/bin/ceph", line 622, in main
conffile=conffile)
  File "/usr/lib/python2.6/site-packages/rados.py", line 207, in __init__
self.librados = CDLL(librados_path)
  File "/usr/lib64/python2.6/ctypes/__init__.py", line 353, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /usr/lib64/librados.so.2: undefined symbol: lttng_probe_register

Local variables in innermost frame:
_FuncPtr: 
handle: None
name: 'librados.so.2'
use_last_error: False


Is rhel6.6 supported at all AND tested?
I feel like being the first one who tries to install ceph on rhel6.6.

I hope there will be a fix very soon.

Erik


--
Bayerischer Rundfunk; Rundfunkplatz 1; 80335 München
Telefon: +49 89 590001; E-Mail: i...@br.de; Website: http://www.BR.de
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy: missing tests

2014-12-15 Thread Lukac, Erik
Hi Guys,

in 
https://github.com/ceph/ceph-deploy/blob/master/ceph_deploy/tests/unit/hosts/test_centos.py
RHEL6.6, which was released on Oct 14th this year is missing:
 params= {
'test_repository_url_part': [
dict(distro="CentOS Linux", release='4.3', codename="Foo", output='el6'),
dict(distro="CentOS Linux", release='6.5', codename="Final", output='el6'),
dict(distro="CentOS Linux", release='7.0', codename="Core", output='el7'),
dict(distro="CentOS Linux", release='7.0.1406', codename="Core", output='el7'),
dict(distro="CentOS Linux", release='10.4.000', codename="Core", output='el10'),
dict(distro="RedHat", release='4.3', codename="Foo", output='el6'),
dict(distro="Red Hat Enterprise Linux Server", release='5.8', 
codename="Tikanga", output="el6"),
dict(distro="Red Hat Enterprise Linux Server", release='6.5', 
codename="Santiago", output='rhel6'),
dict(distro="Red Hat Enterprise Linux Server", release='6.6', 
codename="Santiago", output='rhel6'),
dict(distro="RedHat", release='7.0.1406', codename="Core", output='rhel7'),
dict(distro="RedHat", release='10.999.12', codename="Core", output='rhel10'),
],
'test_rpm_dist': [
dict(distro="CentOS Linux", release='4.3', codename="Foo", output='el6'),
dict(distro="CentOS Linux", release='6.5', codename="Final", output='el6'),
dict(distro="CentOS Linux", release='7.0', codename="Core", output='el7'),
dict(distro="CentOS Linux", release='7.0.1406', codename="Core", output='el7'),
dict(distro="CentOS Linux", release='10.10.9191', codename="Core", 
output='el10'),
dict(distro="RedHat", release='4.3', codename="Foo", output='el6'),
dict(distro="Red Hat Enterprise Linux Server", release='5.8', 
codename="Tikanga", output="el6"),
dict(distro="Red Hat Enterprise Linux Server", release='6.5', 
codename="Santiago", output='el6'),
dict(distro="Red Hat Enterprise Linux Server", release='6.6', 
codename="Santiago", output='el6'),
dict(distro="RedHat", release='7.0', codename="Core", output='el7'),
dict(distro="RedHat", release='7.0.1406', codename="Core", output='el7'),
dict(distro="RedHat", release='10.9.8765', codename="Core", output='el10'),
]

Seems like it was forgotten.

Erik
--
Bayerischer Rundfunk; Rundfunkplatz 1; 80335 München
Telefon: +49 89 590001; E-Mail: i...@br.de; Website: http://www.BR.de
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Confusion about journals and caches

2014-12-15 Thread Max Power
At the moment I am a bit confused about how to configure my journals and where.
I will start my first Ceph-experience with a small home cluster made of two
nodes. Both nodes will get around three to five harddisks and one ssd each. The
harddisks are XFS formated and each one represents an OSD. The ceph pool is
erasure coded(!) and therefore I need a replicated pool on top of it as a cache
tier in order to use block devices. See this diagram for explanation (ASCII):

 I  Cache Tier: Journal
 II Cache Tier: Filestore
 +++
IIIOSD1 OSD2  .. OSDn
IV Journal  Journal  Journal
 V FilestoreFilestoreFilestore
  
Now I have several journals and settings so that I don't know how to configure
it and where. For each line of the diagram above- where would you place it? On a
harddisk or on the ssd? Does it make sense to store _all_ journals on one SSD?
Does it even make sense to have journals for the OSDs at all as there already is
a journal for the cache tier?

The confusion even gets worse as it comes to harddisk hardware caches (hdparm).
Turn these off? Documentation sounds like this cache is automatically turned off
by newer kernels.

Greetings from germany!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Block device and Trim/Discard

2014-12-15 Thread Max Power
> Ilya Dryomov  hat am 12. Dezember 2014 um 18:00
> geschrieben:
> Just a note, discard support went into 3.18, which was released a few
> days ago.

I recently compiled 3.18 on Debian 7 and what do I have to say... It works
perfectly well. The used memory goes up and down again. So I think this will be
my choice. Thank you!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] IO Hang on rbd

2014-12-15 Thread Ilya Dryomov
On Mon, Dec 15, 2014 at 4:11 PM, Tomasz Kuzemko  wrote:
> Try lowering "filestore max sync interval" and "filestore min sync
> interval". It looks like during the hanged period data is flushed from
> some overly big buffer.
>
> If this does not help you can monitor perf stats on OSDs to see if some
> queue is unusually large.

This must be a kernel client issue.  OP, please don't change any
settings, I need it to reproduce to gather more info.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] IO Hang on rbd

2014-12-15 Thread Ilya Dryomov
On Thu, Dec 11, 2014 at 7:57 PM, reistlin87 <79026480...@yandex.ru> wrote:
> Hi all!
>
> We have an annoying problem - when we launch intensive reading with rbd, the 
> client, to which mounted image, hangs in this state:
>
> Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
> avgqu-sz   await r_await w_await  svctm  %util
> sda   0.00 0.000.001.20 0.00 0.00 8.00
>  0.000.000.000.00   0.00   0.00
> dm-0  0.00 0.000.001.20 0.00 0.00 8.00
>  0.000.000.000.00   0.00   0.00
> dm-1  0.00 0.000.000.00 0.00 0.00 0.00
>  0.000.000.000.00   0.00   0.00
> rbd0  0.00 0.000.000.00 0.00 0.00 0.00
> 32.000.000.000.00   0.00 100.00
>
> Only  reboot helps. The logs are clean.
>
> The fastest way to get hang it is run fio read with block size 512K, 4K  
> usually works fine. But client may hang without fio - only because of heavy 
> load.
>
> We used different versions of the linux kernel and ceph - now on OSD and MONS 
> we use ceph 0.87-1 and linux kernel 3.18. On the clients we have tried the 
> latest versions from here http://gitbuilder.ceph.com/. , for example Ceph  
> 0.87-68. Through libvirt everything works fine - we also  use  KVM  and stgt 
> (but stgs is slow)

Is there anything in dmesg around the time it hangs?

If possible, don't change anything about your config - number of osds,
number of pgs, pools, etc so you can reproduce with logging enabled.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw-Agent

2014-12-15 Thread lakshmi k s
Hello -
Can anyone help me locate the Debian-type source packages for radosgw-agent?
Thanks,Lakshmi. 

 On Monday, December 8, 2014 6:10 AM, lakshmi k s  wrote:
   

 Hello Sage - 
Just wondering if you are the module owner for radosgw-agent? If so, can you 
please help me to locate the latest source bits for debian wheezy?
Thanks,Lakshmi. 

 On Wednesday, December 3, 2014 8:42 PM, lakshmi k s  
wrote:
   

 Hello - Please help me here. Where I can locate the source package? 

 On Tuesday, December 2, 2014 12:41 PM, lakshmi k s  
wrote:
   

 Hello:
I am trying to locate the source package used for DebianWheezy for the 
radosgw-agent 1.2-1-bpo70+1 that is available from the cephrepository. 
Our company requires us to verify package builds fromsource and to check 
licenses from those same source packages. However I have notbeen able to locate 
the source package for the 1.2-1~bpo70+1 version that isavailable as a 
pre-built package for debian wheezy from the current cephsoftware repository.  
Can anyone tell me where the repo is that I can put intomy sources.list so I 
can pull this down to do our required verification steps?  Thank you.Lakshmi.






   ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] IO Hang on rbd

2014-12-15 Thread Tomasz Kuzemko
Try lowering "filestore max sync interval" and "filestore min sync
interval". It looks like during the hanged period data is flushed from
some overly big buffer.

If this does not help you can monitor perf stats on OSDs to see if some
queue is unusually large.

-- 
Tomasz Kuzemko
tomasz.kuze...@ovh.net

On Thu, Dec 11, 2014 at 07:57:48PM +0300, reistlin87 wrote:
> Hi all!
> 
> We have an annoying problem - when we launch intensive reading with rbd, the 
> client, to which mounted image, hangs in this state:
> 
> Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
> avgqu-sz   await r_await w_await  svctm  %util
> sda   0.00 0.000.001.20 0.00 0.00 8.00
>  0.000.000.000.00   0.00   0.00
> dm-0  0.00 0.000.001.20 0.00 0.00 8.00
>  0.000.000.000.00   0.00   0.00
> dm-1  0.00 0.000.000.00 0.00 0.00 0.00
>  0.000.000.000.00   0.00   0.00
> rbd0  0.00 0.000.000.00 0.00 0.00 0.00
> 32.000.000.000.00   0.00 100.00
> 
> Only  reboot helps. The logs are clean.
> 
> The fastest way to get hang it is run fio read with block size 512K, 4K  
> usually works fine. But client may hang without fio - only because of heavy 
> load.
> 
> We used different versions of the linux kernel and ceph - now on OSD and MONS 
> we use ceph 0.87-1 and linux kernel 3.18. On the clients we have tried the 
> latest versions from here http://gitbuilder.ceph.com/. , for example Ceph  
> 0.87-68. Through libvirt everything works fine - we also  use  KVM  and stgt 
> (but stgs is slow)
> 
> Here is my config:
> [global]
> fsid = 566d9cab-793e-47e0-a0cd-e5da09f8037a
> mon_initial_members = 
> srt-mon-001-02,amz-mon-001-000601,db24-mon-001-000105
> mon_host = 10.201.20.31,10.203.20.56,10.202.20.58
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
> public network = 10.201.20.0/22
> cluster network = 10.212.36.0/22
> osd crush update on start = false
> [mon]
> debug mon = 0
> debug paxos = 0/0
> debug auth = 0
> 
> [mon.srt-mon-001-02]
> host = srt-mon-001-02
> mon addr = 10.201.20.31:6789
> [mon.db24-mon-001-000105]
> host = db24-mon-001-000105
> mon addr = 10.202.20.58:6789
> [mon.amz-mon-001-000601]
> host = amz-mon-001-000601
> mon addr = 10.203.20.56:6789
> [osd]
> osd crush update on start = false
> osd mount options xfs = "rw,noatime,inode64,allocsize=4M"
> osd mkfs type = xfs
> osd mkfs options xfs = "-f -i size=2048"
> osd op threads = 20
> osd disk threads =8
> journal block align = true
> journal dio = true
> journal aio = true
> osd recovery max active = 1
> filestore max sync interval = 100
> filestore min sync interval = 10
> filestore queue max ops = 2000
> filestore queue max bytes = 536870912
> filestore queue committing max ops = 2000
> filestore queue committing max bytes = 536870912
> osd max backfills = 1
> osd client op priority = 63
> [osd.5]
> host = srt-osd-001-050204
> [osd.6]
> host = srt-osd-001-050204
> [osd.7]
> host = srt-osd-001-050204
> [osd.8]
> host = srt-osd-001-050204
> [osd.109]
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


signature.asc
Description: Digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] system metrics monitoring

2014-12-15 Thread Denish Patel
Or Nagios 

Thanks,
Denish Patel


> On Dec 12, 2014, at 5:38 AM, Thomas Foster  wrote:
> 
> You can also try Sensu..
> 
>> On Dec 12, 2014 1:05 AM, "pragya jain"  wrote:
>> hello sir!
>> 
>> According to TomiTakussaari/riak_zabbix
>> Currently supported Zabbix keys:
>> riak.ring_num_partitions
>> riak.memory_total
>> riak.memory_processes_used
>> riak.pbc_active
>> riak.pbc_connects
>> riak.node_gets
>> riak.node_puts
>> riak.node_get_fsm_time_median
>> riak.node_put_fsm_time_median
>> All these metrics are monitored by collectd, OpenTSDB and Ganglia also.
>> I need some monitoring tool that monitor metrics,like,
>> Available Disk Space
>> IOWait
>> Read Operations
>> Write Operations
>> Network Throughput
>> Load Average
>> Does Zabbix provide monitoring of these metrics?
>> 
>> Thanks 
>> Regards
>> Pragya jain
>> 
>> 
>> On Friday, 12 December 2014 11:05 AM, Irek Fasikhov  
>> wrote:
>> 
>> 
>> Hi.
>> 
>> We use Zabbix.
>> 
>> 2014-12-12 8:33 GMT+03:00 pragya jain :
>> hello sir!
>> 
>> I need some open source monitoring tool for examining these metrics.
>> 
>> Please suggest some open source monitoring software.
>> 
>> Thanks 
>> Regards 
>> Pragya Jain
>> 
>> 
>> On Thursday, 11 December 2014 9:16 PM, Denish Patel  
>> wrote:
>> 
>> 
>> Try http://www.circonus.com
>> 
>> On Thu, Dec 11, 2014 at 1:22 AM, pragya jain  wrote:
>> please somebody reply my query.
>> 
>> Regards
>> Pragya Jain
>> 
>> 
>> On Tuesday, 9 December 2014 11:53 AM, pragya jain  
>> wrote:
>> 
>> 
>> hello all!
>> 
>> As mentioned at statistics and monitoring page of Riak 
>> Systems Metrics To Graph
>> Metric
>> Available Disk Space
>> IOWait
>> Read Operations
>> Write Operations
>> Network Throughput
>> Load Average
>> Can somebody suggest me some monitoring tools that monitor these metrics?
>> 
>> Regards 
>> Pragya Jain
>> 
>> 
>> 
>> ___
>> riak-users mailing list
>> riak-us...@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 
>> 
>> 
>> -- 
>> Denish Patel,
>> OmniTI Computer Consulting Inc.
>> Database Architect,
>> http://omniti.com/does/data-management
>> http://www.pateldenish.com
>> 
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> 
>> 
>> 
>> -- 
>> С уважением, Фасихов Ирек Нургаязович
>> Моб.: +79229045757
>> 
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mds cluster is degraded

2014-12-15 Thread 王丰田
 

Hi,

 

Now I am using cephfs whith mds. I mounted cephfs through ceph-fuse. It
worked well until yesterday when I add some new osds and hosst to the
cluster. After that I can’t user cephfs any more .

 

 

It shows that when I check it whith “ceph �Cs”:

 

cluster e7545c1d-f452-4893-8ba2-29038fc8a767

 health HEALTH_WARN 1 pgs down; 2 pgs incomplete; 2 pgs stuck inactive;
2 pgs stuck unclean; 15 requests are blocked > 32 sec; mds cluster is
degraded; clock skew detected on mon.c, mon.d, mon.e

 monmap e1: 5 mons at
{a=30.10.0.6:6789/0,b=30.10.0.7:6789/0,c=30.10.0.8:6789/0,d=30.10.0.9:6789/0
,e=30.10.0.10:6789/0}, election epoch 294, quorum 0,1,2,3,4 a,b,c,d,e

mdsmap e178: 1/1/1 up {0=a=up:rejoin}

 osdmap e10551: 34 osds: 34 up, 34 in

  pgmap v1748469: 17216 pgs, 7 pools, 340 GB data, 104 kobjects

997 GB used, 99774 GB / 100772 GB avail

   1 down+incomplete

   17214 active+clean

   1 incomplete

 

And “ceph health detail”  it shows:

mds cluster is degraded

mds.a at 30.10.0.6:6807/29136 rank 0 is rejoining

 

Can you help me fix this problem or have any idea to get the data stored in
the cephfs back?

 

Regards,

Fengtiang, Wang

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] IO Hang on rbd

2014-12-15 Thread reistlin87
Hi all!

We have an annoying problem - when we launch intensive reading with rbd, the 
client, to which mounted image, hangs in this state:

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sda   0.00 0.000.001.20 0.00 0.00 8.00 
0.000.000.000.00   0.00   0.00
dm-0  0.00 0.000.001.20 0.00 0.00 8.00 
0.000.000.000.00   0.00   0.00
dm-1  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
rbd0  0.00 0.000.000.00 0.00 0.00 0.00
32.000.000.000.00   0.00 100.00

Only  reboot helps. The logs are clean.

The fastest way to get hang it is run fio read with block size 512K, 4K  
usually works fine. But client may hang without fio - only because of heavy 
load.

We used different versions of the linux kernel and ceph - now on OSD and MONS 
we use ceph 0.87-1 and linux kernel 3.18. On the clients we have tried the 
latest versions from here http://gitbuilder.ceph.com/. , for example Ceph  
0.87-68. Through libvirt everything works fine - we also  use  KVM  and stgt 
(but stgs is slow)

Here is my config:
[global]
fsid = 566d9cab-793e-47e0-a0cd-e5da09f8037a
mon_initial_members = 
srt-mon-001-02,amz-mon-001-000601,db24-mon-001-000105
mon_host = 10.201.20.31,10.203.20.56,10.202.20.58
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
public network = 10.201.20.0/22
cluster network = 10.212.36.0/22
osd crush update on start = false
[mon]
debug mon = 0
debug paxos = 0/0
debug auth = 0

[mon.srt-mon-001-02]
host = srt-mon-001-02
mon addr = 10.201.20.31:6789
[mon.db24-mon-001-000105]
host = db24-mon-001-000105
mon addr = 10.202.20.58:6789
[mon.amz-mon-001-000601]
host = amz-mon-001-000601
mon addr = 10.203.20.56:6789
[osd]
osd crush update on start = false
osd mount options xfs = "rw,noatime,inode64,allocsize=4M"
osd mkfs type = xfs
osd mkfs options xfs = "-f -i size=2048"
osd op threads = 20
osd disk threads =8
journal block align = true
journal dio = true
journal aio = true
osd recovery max active = 1
filestore max sync interval = 100
filestore min sync interval = 10
filestore queue max ops = 2000
filestore queue max bytes = 536870912
filestore queue committing max ops = 2000
filestore queue committing max bytes = 536870912
osd max backfills = 1
osd client op priority = 63
[osd.5]
host = srt-osd-001-050204
[osd.6]
host = srt-osd-001-050204
[osd.7]
host = srt-osd-001-050204
[osd.8]
host = srt-osd-001-050204
[osd.109]

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Multiple issues :( Ubuntu 14.04, latest Ceph

2014-12-15 Thread Benjamin
Hey there,

I've set up a small VirtualBox cluster of Ceph VMs. I have one
"ceph-admin0" node, and three "ceph0,ceph1,ceph2" nodes for a total of 4.

I've been following this guide:
http://ceph.com/docs/master/start/quick-ceph-deploy/ to the letter.

At the end of the guide, it calls for you to run "ceph health"... this is
what happens when I do.

"HEALTH_ERR 64 pgs stale; 64 pgs stuck stale; 2 full osd(s); 2/2 in osds
are down"

Additionally I would like to build and run Calamari to have an overview of
the cluster once it's up and running. I followed all the directions here:
http://calamari.readthedocs.org/en/latest/development/building_packages.html

but the calamari-client package refuses to properly build under
trusty-package for some reason. This is the output at the end of salt-call:

Summary

Succeeded: 3 (changed=4)
Failed:3

Here is the full (verbose!) output: http://pastebin.com/WJwCxxxK

The machines each have Ubuntu 14.04 64-bit, with 1GB of RAM and 8GB of
disk. They have between 10% and 30% disk utilization but common between all
of them is that they *have free disk space* meaning I have no idea what the
heck is causing Ceph to complain.

Help? :(

~ Benjamin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] my cluster has only rbd pool

2014-12-15 Thread wang lin
Hi Mark
 Thank you!
 I create a data pool as you want said, it works.
 By the way, only adding a metadata server by command "ceph-deploy mds 
create node1" still doesn't create the metadata or data pool, right?

Thanks
Lin, Wang

> Date: Sun, 14 Dec 2014 18:52:20 +1300
> From: mark.kirkw...@catalyst.net.nz
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] my cluster has only rbd pool
> 
> On 14/12/14 17:25, wang lin wrote:
> > Hi All
> >I set up my first ceph cluster according to instructions in
> > http://ceph.com/docs/master/start/quick-ceph-deploy/#storing-retrieving-object-data
> >
> >but I got this error "error opening pool data: (2) No such file
> > or directory" when using command "rados put hello_obj hello --pool=data".
> >I typed the command "ceph osd lspools", the result only show "0
> > rbd,", no other pools.
> >Did I missing anything?
> >Could anyone give me some advise?
> 
> In Giant, only the 'rbd' pool is created by default. If you add an mds 
> (i.e activate cephfs) then you will get 'data' and 'metadata' pools too.
> 
> If you are following the docs (which probably need to be changed because 
> of this...) you can just create a 'data' pool yourself:
> 
> $ ceph osd pool create data 64 64
> 
> Cheers
> 
> Mark
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Running ceph in Deis/Docker

2014-12-15 Thread Jimmy Chu

Hi,

I am running a 3-node Deis cluster with ceph as underlying FS. So it is 
ceph running inside Docker containers running in three separate servers. 
I rebooted all three nodes (almost at once). After rebooted, the ceph 
monitor refuse to connect to each other.


Symptoms are:
- no quorum formed,
- ceph admin socket file does not exist
- only the following in ceph log:

Dec 14 16:38:44 deis-1 sh[933]: 2014-12-14 08:38:44.265419 7f5cec71f700  
0 -- :/121 >> 10.132.183.191:6789/0 pipe(0x7f5ce40296a0 sd=4 :0 s=1 
pgs=0 cs=0 l=1

c=0x7f5ce4029930).fault
Dec 14 16:38:44 deis-1 sh[933]: 2014-12-14 08:38:44.265419 7f5cec71f700  
0 -- :/121 >> 10.132.183.192:6789/0 pipe(0x7f5ce40296a0 sd=4 :0 s=1 
pgs=0 cs=0 l=1

c=0x7f5ce4029930).fault
Dec 14 16:38:50 deis-1 sh[933]: 2014-12-14 08:38:50.267398 7f5cec71f700  
0 -- :/121 >> 10.132.183.190:6789/0 pipe(0x7f5cd40030e0 sd=4 :0 s=1 
pgs=0 cs=0 l=1

c=0x7f5cd4003370).fault
...keep repeating...

This is *my /etc/ceph/ceph.conf file*:
[global]
fsid = cc368515-9dc6-48e2-9526-58ac4cbb3ec9
mon initial members = deis-3
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd pool default size = 3
osd pool default min_size = 1
osd pool default pg_num = 128
osd pool default pgp_num = 128
osd recovery delay start = 15
log file = /dev/stdout

[mon.deis-3]
host = deis-3
mon addr = 10.132.183.190:6789

[mon.deis-1]
host = deis-1
mon addr = 10.132.183.191:6789

[mon.deis-2]
host = deis-2
mon addr = 10.132.183.192:6789

[client.radosgw.gateway]
host = deis-store-gateway
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
log file = /dev/stdout

*IP table of the docker host:**
*core@deis-3 ~ $ sudo iptables --list
Chain INPUT (policy DROP)
target prot opt source destination
Firewall-INPUT  all  --  anywhere anywhere

Chain FORWARD (policy DROP)
target prot opt source destination
ACCEPT tcp  --  anywhere 172.17.0.2   tcp dpt:http
ACCEPT tcp  --  anywhere 172.17.0.2   tcp dpt:https
ACCEPT tcp  --  anywhere 172.17.0.2   tcp dpt:
ACCEPT all  --  anywhere anywhere ctstate 
RELATED,ESTABLISHED

ACCEPT all  --  anywhere anywhere
ACCEPT all  --  anywhere anywhere
Firewall-INPUT  all  --  anywhere anywhere

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Chain Firewall-INPUT (2 references)
target prot opt source destination
ACCEPT all  --  anywhere anywhere
ACCEPT icmp --  anywhere anywhere icmp echo-reply
ACCEPT icmp --  anywhere anywhere icmp 
destination-unreachable

ACCEPT icmp --  anywhere anywhere icmp time-exceeded
ACCEPT icmp --  anywhere anywhere icmp echo-request
ACCEPT all  --  anywhere anywhere ctstate 
RELATED,ESTABLISHED

ACCEPT all  --  10.132.183.190 anywhere
ACCEPT all  --  10.132.183.192 anywhere
ACCEPT all  --  10.132.183.191 anywhere
ACCEPT all  --  anywhere anywhere
ACCEPT tcp  --  anywhere anywhere ctstate NEW multiport 
dports ssh,,http,https

LOGall  --  anywhere anywhere LOG level warning
REJECT all  --  anywhere anywhere reject-with 
icmp-host-prohibited



All private IPs are ping-gable within the ceph monitor container. What 
could I do next to troubleshoot this issue?


Thanks a lot!

- Jimmy Chu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow RBD performance bs=4k

2014-12-15 Thread Mark Kirkwood

On 15/12/14 17:44, ceph@panther-it.nl wrote:

I have the following setup:
Node1 = 8 x SSD
Node2 = 6 x SATA
Node3 = 6 x SATA


Having 1 node different from the rest is not going to help...you will 
probably get better results if you sprinkle the SSD through all 3 nodes 
and use SATA for osd data and the SSD for osd journal.



Client1
All Cisco UCS running RHEL6.5 + kernel 3.18.0 + ceph 0.88.

A "dd bs=4k oflag=direct" test directly on a OSD disk shows me:
Node1 = 60MB/s
Node2 = 30MB/s
Node2 = 30MB/s



Hmmm - your SSD are slow for direct writes (15K IOPS if my maths is 
right - what make and model are they)? For that matter your SATA seem a 
bit pretty slow too (what make and model are they)?


And as Christian has mentioned, ceph small block size IO performance has 
been discussed at length previously, so it is worth searching the 
archives to understand the state of things and see that there has been 
*some* progress with improving this issue.


Cheers

Mark


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Hi.. s3cmd unable to create buckets

2014-12-15 Thread Luis Periquito
Have you created the * DNS record?

bucket1. needs to resolve to that IP address (that's what
you're saying in the host_bucket directive).

On Mon, Dec 15, 2014 at 5:52 AM, Ruchika Kharwar 
wrote:
>
> Apologies for re-asking this question since I found several hits on this
> question but not very clear answers.
>
> I am in a situation where s3cmd ls seems to work
> but s3cmd mb s3://bucket1 does not
>
> 1. The rgw dns name  = servername in the apache rados.vhost.conf file. and
> on the client running the s3cmd
> the .s3cfg has
> host_base = 
> host_bucket = %(bucket)s.
>
> the rgs dns name and that in the apache2 rados.vhost.conf has rgw dns name
> and Servername set to 
>
> Please advise
>
> Thank you
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to start radosgw

2014-12-15 Thread Vivek Varghese Cherian
Hi,



>> Do I need to overwrite the existing .db files and .txt file in
>> /var/lib/nssdb on the radosgw host  with the ones copied from
>> /var/ceph/nss on the Juno node ?
>>
>>
> Yeah - worth a try (we want to rule out any certificate mis-match errors).
>
> Cheers
>
> Mark
>
>

I have manually copied the keys from the directory /var/ceph/nss on the
juno node to the /var/ceph/nss on my radogw node, I have also made the
following changes to my ceph.conf:

#rgw keystone url = 10.x.x.175:35357
rgw keystone url = 10.x.x.175:5000
rgw keystone admin token = password123
rgw keystone accepted roles = Member, admin
rgw keystone token cache size = 1
rgw keystone revocation interval = 15 * 60
rgw s3 auth use keystone = true
#nss db path = /var/lib/nssdb
nss db path = /var/ceph/nss

I have restarted the radosgw and it works.

ceph@ppm-c240-ceph3:~$ ps aux | grep rados
root 19833  0.2  0.0 10324668 33288 ?  Ssl  Dec12   7:30
/usr/bin/radosgw -n client.radosgw.gateway
ceph 28101  0.0  0.0  10464   916 pts/0S+   02:25   0:00 grep
--color=auto rados
ceph@ppm-c240-ceph3:~$


Imho, the document ( http://ceph.com/docs/master/radosgw/keystone/ ) should
explicitly state that the /var/ceph/nss directory should be created on the
radosgw node and not on the openstack node.

I had a discussion with Loïc Dachary on irc, and on his request, I have
filed a bug against the documentation.

The ticket url is http://tracker.ceph.com/issues/10305


Btw, thanks Mark for the pointers.


Regards,
---
Vivek Varghese Cherian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Hi.. s3cmd unable to create buckets

2014-12-15 Thread Ruchika Kharwar
Apologies for re-asking this question since I found several hits on this
question but not very clear answers.

I am in a situation where s3cmd ls seems to work
but s3cmd mb s3://bucket1 does not

1. The rgw dns name  = servername in the apache rados.vhost.conf file. and
on the client running the s3cmd
the .s3cfg has
host_base = 
host_bucket = %(bucket)s.

the rgs dns name and that in the apache2 rados.vhost.conf has rgw dns name
and Servername set to 

Please advise

Thank you
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] my cluster has only rbd pool

2014-12-15 Thread Craig Lewis
If you're running Ceph 0.88 or newer, only the rdb pool is created by
default now.  Greg Farnum mentioned that the docs are out of date there.

On Sat, Dec 13, 2014 at 8:25 PM, wang lin  wrote:
>
> Hi All
>   I set up my first ceph cluster according to instructions in
> 
> http://ceph.com/docs/master/start/quick-ceph-deploy/#storing-retrieving-object-data
>   but I got this error "error opening pool data: (2) No such file or
> directory" when using command "rados put hello_obj hello --pool=data".
>   I typed the command "ceph osd lspools", the result only show "0
> rbd,", no other pools.
>   Did I missing anything?
>   Could anyone give me some advise?
>
> Thanks
> Lin, Wangf
>
>
>
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stripping data

2014-12-15 Thread John Spray
Yes, setfattr is the preferred way.  The docs are here:
http://ceph.com/docs/master/cephfs/file-layouts/

Cheers,
John

On Mon, Dec 15, 2014 at 8:12 AM, Ilya Dryomov  wrote:
> On Sun, Dec 14, 2014 at 10:38 AM, Kevin Shiah  wrote:
>> Hello All,
>>
>> Does anyone know how to configure data stripping when using ceph as file
>> system? My understanding is that configuring stripping with rbd is only for
>> block device.
>
> You should be able to set layout.* xattrs on directories and empty
> files (directory layout just sets the default layout for the newly
> created files within it).  There are also a couple of ioctls which do
> essentially the same thing but I think their use is discouraged.
> John will correct me if I'm wrong.
>
> Thanks,
>
> Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] unable to repair PG

2014-12-15 Thread Luis Periquito
Just to update this issue.

I stopped OSD.6, removed the PG from disk, and restarted it. Ceph rebuilt
the object and it went to HEALTH_OK.

During the weekend the disk for OSD.6 started giving smart errors and will
be replaced.

Thanks for your help Greg. I've opened a bug report in the tracker.

On Fri, Dec 12, 2014 at 9:53 PM, Gregory Farnum  wrote:
>
> [Re-adding the list]
>
> Yeah, so "shard 6" means that it's osd.6 which has the bad data.
> Apparently pg repair doesn't recover from this class of failures; if
> you could file a bug that would be appreciated.
> But anyway, if you delete the object in question from OSD 6 and run a
> repair on the pg again it should recover just fine.
> -Greg
>
> On Fri, Dec 12, 2014 at 1:45 PM, Luis Periquito 
> wrote:
> > Running firefly 0.80.7 with a replicated pools, with 4 copies.
> >
> > On 12 Dec 2014 19:20, "Gregory Farnum"  wrote:
> >>
> >> What version of Ceph are you running? Is this a replicated or
> >> erasure-coded pool?
> >>
> >> On Fri, Dec 12, 2014 at 1:11 AM, Luis Periquito 
> >> wrote:
> >> > Hi Greg,
> >> >
> >> > thanks for your help. It's always highly appreciated. :)
> >> >
> >> > On Thu, Dec 11, 2014 at 6:41 PM, Gregory Farnum 
> >> > wrote:
> >> >>
> >> >> On Thu, Dec 11, 2014 at 2:57 AM, Luis Periquito  >
> >> >> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > I've stopped OSD.16, removed the PG from the local filesystem and
> >> >> > started
> >> >> > the OSD again. After ceph rebuilt the PG in the removed OSD I ran a
> >> >> > deep-scrub and the PG is still inconsistent.
> >> >>
> >> >> What led you to remove it from osd 16? Is that the one hosting the
> log
> >> >> you snipped from? Is osd 16 the one hosting shard 6 of that PG, or
> was
> >> >> it the primary?
> >> >
> >> > OSD 16 is both the primary for this PG and the one that has the
> snipped
> >> > log.
> >> > The other 3 OSDs has any mention of this PG in their logs. Just some
> >> > messages about slow requests and the backfill when I removed the
> object.
> >> > Actually it came from OSD.6 - currently we don't have OSD.3.
> >> >
> >> > this is the output of the pg dump for this PG
> >> > 9.180256140002330648234830013001
> >> > active+clean+inconsistent2014-12-10 17:29:01.937929
> 40242'1108124
> >> > 40242:23305321[16,10,27,6]16[16,10,27,6]16
> 40242'1071363
> >> > 2014-12-10 17:29:01.93788140242'10713632014-12-10
> >> > 17:29:01.937881
> >> >
> >> >>
> >> >> Anyway, the message means that shard 6 (which I think is the seventh
> >> >> OSD in the list) of PG 9.180 is missing a bunch of xattrs on object
> >> >> 370cbf80/29145.4_xxx/head//9. I'm actually a little surprised it
> >> >> didn't crash if it's missing the "_" attr
> >> >> -Greg
> >> >
> >> >
> >> > Any idea on how to fix it?
> >> >
> >> >>
> >> >>
> >> >> >
> >> >> > I'm running out of ideas on trying to solve this. Does this mean
> that
> >> >> > all
> >> >> > copies of the object should also be inconsistent? Should I just try
> >> >> > to
> >> >> > figure which object/bucket this belongs to and delete it/copy it
> >> >> > again
> >> >> > to
> >> >> > the ceph cluster?
> >> >> >
> >> >> > Also, do you know what the error message means? is it just some
> sort
> >> >> > of
> >> >> > metadata for this object that isn't correct, not the object itself?
> >> >> >
> >> >> > On Wed, Dec 10, 2014 at 11:11 AM, Luis Periquito
> >> >> > 
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> In the last few days this PG (pool is .rgw.buckets) has been in
> >> >> >> error
> >> >> >> after running the scrub process.
> >> >> >>
> >> >> >> After getting the error, and trying to see what may be the issue
> >> >> >> (and
> >> >> >> finding none), I've just issued a ceph repair followed by a ceph
> >> >> >> deep-scrub.
> >> >> >> However it doesn't seem to have fixed the issue and it still
> >> >> >> remains.
> >> >> >>
> >> >> >> The relevant log from the OSD is as follows.
> >> >> >>
> >> >> >> 2014-12-10 09:38:09.348110 7f8f618be700  0 log [ERR] : 9.180
> >> >> >> deep-scrub
> >> >> >> 0
> >> >> >> missing, 1 inconsistent objects
> >> >> >> 2014-12-10 09:38:09.348116 7f8f618be700  0 log [ERR] : 9.180
> >> >> >> deep-scrub
> >> >> >> 1
> >> >> >> errors
> >> >> >> 2014-12-10 10:13:15.922065 7f8f618be700  0 log [INF] : 9.180
> repair
> >> >> >> ok,
> >> >> >> 0
> >> >> >> fixed
> >> >> >> 2014-12-10 10:55:27.556358 7f8f618be700  0 log [ERR] : 9.180 shard
> >> >> >> 6:
> >> >> >> soid
> >> >> >> 370cbf80/29145.4_xxx/head//9 missing attr _, missing attr
> >> >> >> _user.rgw.acl,
> >> >> >> missing attr _user.rgw.content_type, missing attr _user.rgw.etag,
> >> >> >> missing
> >> >> >> attr _user.rgw.idtag, missing attr _user.rgw.manifest, missing
> attr
> >> >> >> _user.rgw.x-amz-meta-md5sum, missing attr
> _user.rgw.x-amz-meta-stat,
> >> >> >> missing
> >> >> >> attr snapset
> >> >> >> 2014-12-10 10:56:50.597952 7f8f618be700  0 log [ERR] : 9.180
> >> >> >> deep-scrub
> >> >> >> 0
> >> >> 

Re: [ceph-users] Slow RBD performance bs=4k

2014-12-15 Thread Christian Balzer

Hello,

There have been many, many threads about this. 
Google is your friend, so is keeping an eye on threads in this ML.

On Mon, 15 Dec 2014 05:44:24 +0100 ceph@panther-it.nl wrote:

> I have the following setup:
> Node1 = 8 x SSD
> Node2 = 6 x SATA
> Node3 = 6 x SATA
> Client1
> All Cisco UCS running RHEL6.5 + kernel 3.18.0 + ceph 0.88.
> 
> A "dd bs=4k oflag=direct" test directly on a OSD disk shows me:
> Node1 = 60MB/s
> Node2 = 30MB/s
> Node2 = 30MB/s
> 
> I've created 2 pools, each size=1, pg_num=1024.
> I've created a rbd image, formatted it ext4 (bs=4k), but also xfs.

You're not telling us how you mounted it, but since you're not mentioning
VMs anywhere lets assume RBD kernel space.

> A "dd bs=4k oflag=direct"  test on that image shows me   5 MB/s.
Looking at the CPU utilization (and other things) of your storage nodes
during that test with atop or similar should be educational.

Or maybe not, as you're missing one major item (aside the less than
stellar kernel space performance), see below

> A "dd bs=4M oflag=direct"  test on that image shows me 150 MB/s.
This is the same block size as "rados bench" but...

> A "dd bs=32M oflag=direct" test on that image shows me 260 MB/s.
> A "rados bench write"  test on that pool  shows me 560 MB/s.
> 
> What am i doing wrong?

The "rados bench" has a default size of 4MB (which is optimal for the
default ceph settings) _AND_ 16 threads. 
Ceph excels at parallel tasks, single threads will suck in comparison (as
they tend to hit the same target OSDs for the time it takes to write 4MB).

> Why is a 4kb block size write so slow?
> 
See above. 
And once you use a larger amount of threads and 4KB blocks, your CPUs will
melt. 
Try "rados -p poolname bench 30 write -t 64 -b 4096" for some fireworks.

Regards,

Christian
> Thanks for any help...
> 
> 
> Samuel Terburg
> 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD and HA KVM anybody?

2014-12-15 Thread Josef Johansson
Hi Christian,

We’re using Proxmox that has support for HA, they do it per-vm.
We’re doing it manually right now though, because we like it :). 

When I looked at it I couldn’t see a way of just allowing a set of hosts in the 
HA (i.e. not the storage nodes), but that’s probably easy to solve.

Cheers,
Josef

> On 15 Dec 2014, at 04:10, Christian Balzer  wrote:
> 
> 
> Hello,
> 
> What are people here using to provide HA KVMs (and with that I mean
> automatic, fast VM failover in case of host node failure) in with RBD
> images?
> 
> Openstack and ganeti have decent Ceph/RBD support, but no HA (plans
> aplenty though).
> 
> I have plenty of experience with Pacemaker (DRBD backed) but there is only
> an unofficial RBD resource agent for it, which also only supports kernel
> based RBD. 
> And while Pacemaker works great, it scales like leaden porcupines, things
> degrade rapidly after 20 or so instances.
> 
> So what are other people here using to keep their KVM based VMs up and
> running all the time?
> 
> Regards,
> 
> Christian
> -- 
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com Global OnLine Japan/Fusion Communications
> http://www.gol.com/
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to start radosgw

2014-12-15 Thread Mark Kirkwood

On 15/12/14 20:54, Vivek Varghese Cherian wrote:

Hi,



Do I need to overwrite the existing .db files and .txt file in
/var/lib/nssdb on the radosgw host  with the ones copied from
/var/ceph/nss on the Juno node ?


Yeah - worth a try (we want to rule out any certificate mis-match
errors).

Cheers

Mark



I have manually copied the keys from the directory /var/ceph/nss on the
juno node to the /var/ceph/nss on my radogw node, I have also made the
following changes to my ceph.conf:

#rgw keystone url = 10.x.x.175:35357
rgw keystone url = 10.x.x.175:5000
rgw keystone admin token = password123
rgw keystone accepted roles = Member, admin
rgw keystone token cache size = 1
rgw keystone revocation interval = 15 * 60
rgw s3 auth use keystone = true
#nss db path = /var/lib/nssdb
nss db path = /var/ceph/nss

I have restarted the radosgw and it works.

ceph@ppm-c240-ceph3:~$ ps aux | grep rados
root 19833  0.2  0.0 10324668 33288 ?  Ssl  Dec12   7:30
/usr/bin/radosgw -n client.radosgw.gateway
ceph 28101  0.0  0.0  10464   916 pts/0S+   02:25   0:00 grep
--color=auto rados
ceph@ppm-c240-ceph3:~$


Imho, the document ( http://ceph.com/docs/master/radosgw/keystone/ )
should explicitly state that the /var/ceph/nss directory should be
created on the radosgw node and not on the openstack node.

I had a discussion with Loïc Dachary on irc, and on his request, I have
filed a bug against the documentation.

The ticket url is http://tracker.ceph.com/issues/10305


Btw, thanks Mark for the pointers.



Excellent - glad it is working now. Yeah, the docs could certainly be 
clearer. Also the error message from radosgw when the certs are 
wrong/missing could be better too!


Regards

Mark

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stripping data

2014-12-15 Thread Ilya Dryomov
On Sun, Dec 14, 2014 at 10:38 AM, Kevin Shiah  wrote:
> Hello All,
>
> Does anyone know how to configure data stripping when using ceph as file
> system? My understanding is that configuring stripping with rbd is only for
> block device.

You should be able to set layout.* xattrs on directories and empty
files (directory layout just sets the default layout for the newly
created files within it).  There are also a couple of ioctls which do
essentially the same thing but I think their use is discouraged.
John will correct me if I'm wrong.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com