Re: [ceph-users] Write freeze when writing to rbd image and rebooting one of the nodes

2015-05-14 Thread Vasiliy Angapov
Thanks, Robert, for sharing so many experience! I feel like I don't deserve
it :)

I have another but very same situation which I don't understand.
Last time i tried to hard kill OSD daemons.
This time i add a new node with 2 OSDs to my cluster and also monitor the
IO. I wrote a script which adds a node with OSDs fully automatically. And
seems like when I start the script - an IO is also blocked until the
cluster shows HEALTH_OK which takes quite an amount of time. After Ceph
status is OK - copying resumes.

What should I tune this time to avoid long IO interuption?

Thanks in advance again :)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Write freeze when writing to rbd image and rebooting one of the nodes

2015-05-14 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Can you provide the output of the CRUSH map and a copy of the script
that you are using to add the OSDs? Can you also provide the pool size
and pool min_size?
-BEGIN PGP SIGNATURE-
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVVMLvCRDmVDuy+mK58QAAHVIQALIZ8aOWE5P8DkRe+8pz
XS+rMdA17nPUd2mX6PIqhjBxetrUhIjQUho8HSIswT9JVkjVSIj+QHs5CI1C
6ArWIPt/U8L78d1hI8NuH/vWwWydYfV32n2L2LExIgUpFAbJA81AnjjDFLvo
T63KLitQ1wz8lyhAWXp4ze15CgAv1u9VbJhazeeWunxZxd8eSGuUS8RTdhLD
sD0pSQnVT4W04TSKYfvbUlpqm68wGY+MApnuQXdpC0jBLcDz0OSu1P+OQC03
0vBCERY1er/rSskJ6TRrQGLzXAc/vc3HbPMvegIhp2voeXgONdO5P/qLfSfD
ZwVUoi6EfFe+na3S4rEjOeBU+v2P00komVEcvjOJDQb3IVcE23iVJOezk3p+
AgJqOz9VLdGvdmZTZnR08PKPZEja80QzrSklRW5f8JyjKlbE8tB5lBoM5mKo
oRcBSDbGSKvXInqygQ3XLdxULHaXbNqNPj+JvPbmfkTU6Iq6pXqcBdUSqG0o
/5Rx16+2Rouz4f8uu5irmDjz0ivKL6QCIzBwZbBTdLIwqhf9vCl1ACDWq4U3
DMorcafZbMArdOqlkVhQJiMioZEQ8U/ThY2bInkNdhii/2A35CToyOfMKyfq
FLAK5lCiM6gRfCkEBPTwkDR6GNAfgY7khz34adsBRlZPB6a3MeucAGtTjyWt
AJIV
=bcYd
-END PGP SIGNATURE-



Robert LeBlanc
GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Thu, May 14, 2015 at 6:33 AM, Vasiliy Angapov anga...@gmail.com wrote:

 Thanks, Robert, for sharing so many experience! I feel like I don't
 deserve it :)

 I have another but very same situation which I don't understand.
 Last time i tried to hard kill OSD daemons.
 This time i add a new node with 2 OSDs to my cluster and also monitor the
 IO. I wrote a script which adds a node with OSDs fully automatically. And
 seems like when I start the script - an IO is also blocked until the
 cluster shows HEALTH_OK which takes quite an amount of time. After Ceph
 status is OK - copying resumes.

 What should I tune this time to avoid long IO interuption?

 Thanks in advance again :)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cisco UCS Blades as MONs? Pros cons ...?

2015-05-14 Thread Jake Young
I have 42 OSDs on 6 servers. I'm planning to double that this quarter by
adding 6 more servers to get to 84 OSDs.

I have 3 monitor VMs. Two of them are running on two different blades in
the same chassis, but their networking is on different fabrics. The third
one is on a blade in a different chassis.

My monitor VM cpu, memory and disk io load is very small, as in nearly
idle. The VM images are on local 10k disks on the blade. They share the
disks with a few other low IO VMs.

I've read that the monitors can get busy and need a lot of IO, where it
justifies using SSDs. I imagine those must be very large clusters with at
least hundreds of OSDs.

Jake

On Wednesday, May 13, 2015, Götz Reinicke - IT Koordinator 
goetz.reini...@filmakademie.de wrote:

 Hi Jake,

 we have the fabric interconnects.

 MONs as VM? What setup do you have? and what cluster size?

 Regards . Götz


 Am 13.05.15 um 15:20 schrieb Jake Young:
  I run my mons as VMs inside of UCS blade compute nodes.
 
  Do you use the fabric interconnects or the standalone blade chassis?
 
  Jake
 
  On Wednesday, May 13, 2015, Götz Reinicke - IT Koordinator
  goetz.reini...@filmakademie.de javascript:; mailto:
 goetz.reini...@filmakademie.de javascript:;
  wrote:
 
  Hi Christian,
 
  currently we do get good discounts as an University and the bundles
 were
  worth it.
 
  The chassis do have multiple PSUs and n 10Gb Ports (40Gb is
 possible).
  The switch connection is redundant.
 
  Cuurrently we think of 10 SATA OSD nodes + x SSD Cache Pool Nodes
 and 5
  MONs. For a start.
 
  The main focus with the blaids would be spacesaving in the rack. Till
  now I dont have any prize, but that woucld count to in our decision
 :)
 
  Thanks and regards . Götz
 
 ...


 --
 Götz Reinicke
 IT-Koordinator

 Tel. +49 7141 969 82 420
 E-Mail goetz.reini...@filmakademie.de javascript:;

 Filmakademie Baden-Württemberg GmbH
 Akademiehof 10
 71638 Ludwigsburg
 www.filmakademie.de

 Eintragung Amtsgericht Stuttgart HRB 205016

 Vorsitzender des Aufsichtsrats: Jürgen Walter MdL
 Staatssekretär im Ministerium für Wissenschaft,
 Forschung und Kunst Baden-Württemberg

 Geschäftsführer: Prof. Thomas Schadt


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Complete freeze of a cephfs client (unavoidable hard reboot)

2015-05-14 Thread John Spray


On 14/05/2015 18:15, Francois Lafont wrote:

Hi,

I had a problem with a cephfs freeze in a client. Impossible to
re-enable the mountpoint. A simple ls /mnt command totally
blocked (of course impossible to umount-remount etc.) and I had
to reboot the host. But even a normal reboot didn't work, the
host didn't stop. I had to do a hard reboot of the host. In brief,
it was like a big NFS freeze. ;)


Greg's response is pretty comprehensive, but for completeness I'll add 
that the specific case of shutdown blocking is 
http://tracker.ceph.com/issues/9477


Cheers,
John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Complete freeze of a cephfs client (unavoidable hard reboot)

2015-05-14 Thread Francois Lafont
Hi,

I had a problem with a cephfs freeze in a client. Impossible to
re-enable the mountpoint. A simple ls /mnt command totally
blocked (of course impossible to umount-remount etc.) and I had
to reboot the host. But even a normal reboot didn't work, the
host didn't stop. I had to do a hard reboot of the host. In brief,
it was like a big NFS freeze. ;)

In the logs, nothing relevant in the client side and just this line
in the cluster side:

~# cat /var/log/ceph/ceph-mds.1.log
[...]
2015-05-14 17:07:17.259866 7f3b5cffc700  0 log_channel(cluster) log [INF] : 
closing stale session client.1342358 192.168.21.207:0/519924348 after 301.329013
[...]

And indeed, the freeze was probably triggered by a little network
interruption.

Here is my configuration:
- OS: Ubuntu 14.04 in the client and in the cluster nodes.
- Kernel: 3.16.0-36-generic in the client and in the cluster nodes.
  (apt-get install linux-image-generic-lts-utopic).
- Ceph version: Hammer in the client and in cluster nodes (0.94.1-1trusty).

In the client, I use the cephfs kernel module (not ceph-fuse). Here
is the fstab line in the client node:

10.0.2.150,10.0.2.151,10.0.2.152:/ /mnt ceph 
noatime,noacl,name=cephfs,secretfile=/etc/ceph/secret,_netdev 0 0

My only configuration concerning mds in ceph.conf is just:

  mds cache size = 100

That's all.

Here are my questions:

1. Is this kind of freeze normal? Can I avoid these freezes with a
more recent version of the kernel in the client?

2. Can I avoid these freezes with ceph-fuse instead of the kernel
cephfs module? But in this case, the cephfs performance will be
worse. Am I wrong?

3. Is there a parameter in ceph.conf to tell mds to be more patient
before closing the stale session of a client?

I'm in a testing period and a hard reboot of my cephfs clients would
be quite annoying for me. Thanks in advance for your help.

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Complete freeze of a cephfs client (unavoidable hard reboot)

2015-05-14 Thread Gregory Farnum
On Thu, May 14, 2015 at 10:15 AM, Francois Lafont flafdiv...@free.fr wrote:
 Hi,

 I had a problem with a cephfs freeze in a client. Impossible to
 re-enable the mountpoint. A simple ls /mnt command totally
 blocked (of course impossible to umount-remount etc.) and I had
 to reboot the host. But even a normal reboot didn't work, the
 host didn't stop. I had to do a hard reboot of the host. In brief,
 it was like a big NFS freeze. ;)

 In the logs, nothing relevant in the client side and just this line
 in the cluster side:

 ~# cat /var/log/ceph/ceph-mds.1.log
 [...]
 2015-05-14 17:07:17.259866 7f3b5cffc700  0 log_channel(cluster) log [INF] 
 : closing stale session client.1342358 192.168.21.207:0/519924348 after 
 301.329013
 [...]

 And indeed, the freeze was probably triggered by a little network
 interruption.

 Here is my configuration:
 - OS: Ubuntu 14.04 in the client and in the cluster nodes.
 - Kernel: 3.16.0-36-generic in the client and in the cluster nodes.
   (apt-get install linux-image-generic-lts-utopic).
 - Ceph version: Hammer in the client and in cluster nodes (0.94.1-1trusty).

 In the client, I use the cephfs kernel module (not ceph-fuse). Here
 is the fstab line in the client node:

 10.0.2.150,10.0.2.151,10.0.2.152:/ /mnt ceph 
 noatime,noacl,name=cephfs,secretfile=/etc/ceph/secret,_netdev 0 0

 My only configuration concerning mds in ceph.conf is just:

   mds cache size = 100

 That's all.

 Here are my questions:

 1. Is this kind of freeze normal? Can I avoid these freezes with a
 more recent version of the kernel in the client?

Yes, it's normal. Although you should have been able to do a lazy
and/or force umount. :)
You can't avoid the freeze with a newer client. :(

If you notice the problem quickly enough, you should be able to
reconnect everything by rebooting the MDS — although if the MDS hasn't
failed the client then things shouldn't be blocking, so actually that
probably won't help you.


 2. Can I avoid these freezes with ceph-fuse instead of the kernel
 cephfs module? But in this case, the cephfs performance will be
 worse. Am I wrong?

No, ceph-fuse will suffer the same blockage, although obviously in
userspace it's a bit easier to clean up. Depending on your workload it
will be slightly faster to a lot slower. Though you'll also get
updates faster/more easily. ;)

 3. Is there a parameter in ceph.conf to tell mds to be more patient
 before closing the stale session of a client?

Yes. You'll need to increase the mds session timeout value on the
MDS; it currently defaults to 60 seconds. You can increase that to
whatever values you like. The tradeoff here is that if you have a
client die, anything it had capabilities' on (for read/write access)
will be unavailable for anybody who's doing something that might
conflict with those capabilities.
If you've got a new enough MDS (Hammer, probably, but you can check)
then you can use the admin socket to boot specific sessions, so it may
suit you to set very large timeouts and manually zap any client which
actually goes away badly (rather than getting disconnected by the
network).


 I'm in a testing period and a hard reboot of my cephfs clients would
 be quite annoying for me. Thanks in advance for your help.

Yeah. Unfortunately there's a basic tradeoff in strictly-consistent
(aka POSIX) network filesystems here: if the network goes away, you
can't be consistent any more because the disconnected client can make
conflicting changes. And you can't tell exactly when the network
disappeared.

So while we hope to make this less painful in the future, the network
dying that badly is a failure case that you need to be aware of
meaning that the client might have conflicting information. If it
*does* have conflicting info, the best we can do about it is be
polite, return a bunch of error codes, and unmount gracefully. We'll
get there eventually but it's a lot of work.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] export-diff exported only 4kb instead of 200-600gb

2015-05-14 Thread Jason Dillaman
Interesting.  The 'rbd diff' operation uses the same librbd API method as 'rbd 
export-diff' to calculate all the updated image extents, so it's very strange 
that one works and the other doesn't given that you have a validly formatted 
export.  I tried to recreate your issues on Giant and was unable to recreate 
it.  I would normally ask for a log dump with 'debug rbd = 20', but given the 
size of your image, that log will be astronomically large.

-- 

Jason Dillaman 
Red Hat 
dilla...@redhat.com 
http://www.redhat.com 


- Original Message - 
From: Ultral ultral...@gmail.com 
To: Jason Dillaman dilla...@redhat.com 
Cc: ceph-users ceph-us...@ceph.com 
Sent: Tuesday, May 12, 2015 12:15:27 PM 
Subject: Re: [ceph-users] export-diff exported only 4kb instead of 200-600gb 

If you run 'rbd info --pool RBD-01 CEPH_006__01__NA__0003__ ESX__ALL_EXT', 
what is the output? 
size 2048 GB in 524288 objects 
order 22 (4096 kB objects) 
block_name_prefix: rb.0.19b1.238e1f29 
format: 1 
 Does 'rbd diff' work against the image (i.e. more than a few kilobyes of 
 deltas)? 
it looks fine 
time rbd diff --cluster cluster1 --pool NETAP-RBD-01 
CEPH_006__01__NA__0003__ESX__ALL_EXT |wc -c 
14593264 

real 22m35.316s 
user 2m39.537s 
sys 1m24.177s 


 Also, would it be possible for you to create a new, test image in the same 
 pool, snapshot it, use 'rbd bench-write' to generate some data, and then 
 verify if export-diff is properly working against the new image? 
i will try.. i can create only 1-100gb image in this pool 

2015-05-12 19:30 GMT+05:00 Jason Dillaman  dilla...@redhat.com  : 


Very strange. I'll see if I can reproduce on a giant release. If you run 'rbd 
info --pool RBD-01 CEPH_006__01__NA__0003__ESX__ALL_EXT', what is the 
output? I want to use the same settings as your image. 

Does 'rbd diff' work against the image (i.e. more than a few kilobyes of 
deltas)? Also, would it be possible for you to create a new, test image in the 
same pool, snapshot it, use 'rbd bench-write' to generate some data, and then 
verify if export-diff is properly working against the new image? 

-- 

Jason Dillaman 
Red Hat 
dilla...@redhat.com 
http://www.redhat.com 


- Original Message - 
From: Ultral  ultral...@gmail.com  
To: Jason Dillaman  dilla...@redhat.com  
Cc: ceph-users  ceph-us...@ceph.com  
Sent: Sunday, May 10, 2015 5:40:00 AM 
Subject: Re: [ceph-users] export-diff exported only 4kb instead of 200-600gb 

Hello Jason, 


 but to me it sounds like you are saying that there are no/minimal deltas 
 between snapshots move2db24-20150428 and 2015-05-05 (both from the 
 export-diff and from your clone). 
yep, it correct. difference between snapshots move2db24-20150428  2015-05-05 
is too small 4kb instead of 200-800gb.. 


 Are you certain that you made 700-800GBs of changes between the two snapshots 
 and no trim operations released your changes back? 
VM locate on the image, it is intranet for 1000 peoples. it has 
web+mysql+sphinx+backups 

vast majority of changed data are backups(2 day rotation) inside VM on the 
image. it made about 200gb of data each day 
also we store users uploads (0.3-3gb per day) 
and databases (about 30gb ) 

so i suppose that changes should be more than 4kb 

 If you diff from move2db24-20150428 to HEAD, do you see all your changes? 

rbd export-diff --cluster cluster1 --pool RBD-01 
CEPH_006__01__NA__0003__ESX__ALL_EXT --from-snap move2db24-20150428 -|wc -c 
6786 
Exporting image: 100% complete...done. 

it is too small.. i've added some video files to the VM, however it shows only 
6kb 

2015-05-08 18:43 GMT+05:00 Jason Dillaman  dilla...@redhat.com  : 


There is probably something that I am not understanding, but to me it sounds 
like you are saying that there are no/minimal deltas between snapshots 
move2db24-20150428 and 2015-05-05 (both from the export-diff and from your 
clone). Are you certain that you made 700-800GBs of changes between the two 
snapshots and no trim operations released your changes back? If you diff from 
move2db24-20150428 to HEAD, do you see all your changes? 

-- 

Jason Dillaman 
Red Hat 
dilla...@redhat.com 
http://www.redhat.com 


- Original Message - 
From: Ultral  ultral...@gmail.com  
To: ceph-users  ceph-us...@ceph.com  
Sent: Thursday, May 7, 2015 11:45:46 AM 
Subject: [ceph-users] export-diff exported only 4kb instead of 200-600gb 

Hi all, 


Something strange occurred. 
I have ceph 0.87 version and 2048gb image format 1. I decided to made 
incremental backups between clusters 

i've made initial copy, 

time bbcp -x 7M -P 3600 -w 32M -s 6 -Z 5030:5035 -N io rbd export-diff 
--cluster cluster1 --pool RBD-01 --image 
CEPH_006__01__NA__0003__ESX__ALL_EXT --snap move2db24-20150428 - 1.1.1.1 
:rbd import-diff - --cluster cluster2 --pool TST-INT-SD-RBD-1DC --image temp 
and decide to move incremental(it should be about 200-600gb of changes) 

time bbcp -c -x 7M -P 3600 -w 32M -s 6 -Z 5030:5035 -N io rbd --cluster 

Re: [ceph-users] Complete freeze of a cephfs client (unavoidable hard reboot)

2015-05-14 Thread Lee Revell
On Thu, May 14, 2015 at 2:47 PM, John Spray john.sp...@redhat.com wrote:

 Greg's response is pretty comprehensive, but for completeness I'll add
 that the specific case of shutdown blocking is
 http://tracker.ceph.com/issues/9477


I've seen the same thing before with /dev/rbd mounts when the network
temporarily goes away - client had to be rebooted. Is this likely to be the
same underlying issue?

Lee
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rados cppool

2015-05-14 Thread Daniel Schneller

On 2015-05-14 21:04:06 +, Daniel Schneller said:


On 2015-04-23 19:39:33 +, Sage Weil said:


On Thu, 23 Apr 2015, Pavel V. Kaygorodov wrote:

Hi!

I have copied two of my pools recently, because old ones has too many pgs.
Both of them contains RBD images, with 1GB and ~30GB of data.
Both pools was copied without errors, RBD images are mountable and
seems to be fine.
CEPH version is 0.94.1


You will likely have problems if you try to delete snapshots that existed
on the images (snaps are not copied/preserved by cppool).

sage


Could you be more specific on what these problems would look like? Are
you referring to RBD pools in particular, or is this a general issue
with snapshots? Anything that could be done to prevent these issues?

Background of the question is that we take daily snapshots of some
pools to allow reverting data when users make mistakes (via RGW). So it
would be difficult to get rid of all snapshots first.

Thanks
Daniel


Never mind, found more information on this on the list a few posts later.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph -w output

2015-05-14 Thread Daniel Schneller

Hi!

I am trying to get behind the values in ceph -w, especially those 
regarding throughput(?) at the end:


2015-05-15 00:54:33.333500 mon.0 [INF] pgmap v26048646: 17344 pgs: 
17344 active+clean; 6296 GB data, 19597 GB used, 155 TB / 174 TB avail; 
6023 kB/s rd, 549 kB/s wr, 7564 op/s
2015-05-15 00:54:34.339739 mon.0 [INF] pgmap v26048647: 17344 pgs: 
17344 active+clean; 6296 GB data, 19597 GB used, 155 TB / 174 TB avail; 
1853 kB/s rd, 1014 kB/s wr, 2015 op/s
2015-05-15 00:54:35.353621 mon.0 [INF] pgmap v26048648: 17344 pgs: 
17344 active+clean; 6296 GB data, 19597 GB used, 155 TB / 174 TB avail; 
2101 kB/s rd, 1680 kB/s wr, 1950 op/s
2015-05-15 00:54:36.375887 mon.0 [INF] pgmap v26048649: 17344 pgs: 
17344 active+clean; 6296 GB data, 19597 GB used, 155 TB / 174 TB avail; 
1641 kB/s rd, 1266 kB/s wr, 1710 op/s
2015-05-15 00:54:37.399647 mon.0 [INF] pgmap v26048650: 17344 pgs: 
17344 active+clean; 6296 GB data, 19597 GB used, 155 TB / 174 TB avail; 
4735 kB/s rd, 777 kB/s wr, 7088 op/s
2015-05-15 00:54:38.453922 mon.0 [INF] pgmap v26048651: 17344 pgs: 
17344 active+clean; 6296 GB data, 19597 GB used, 155 TB / 174 TB avail; 
5176 kB/s rd, 942 kB/s wr, 7779 op/s
2015-05-15 00:54:39.462838 mon.0 [INF] pgmap v26048652: 17344 pgs: 
17344 active+clean; 6296 GB data, 19597 GB used, 155 TB / 174 TB avail; 
3407 kB/s rd, 768 kB/s wr, 2131 op/s
2015-05-15 00:54:40.488387 mon.0 [INF] pgmap v26048653: 17344 pgs: 
17344 active+clean; 6296 GB data, 19597 GB used, 155 TB / 174 TB avail; 
3343 kB/s rd, 518 kB/s wr, 1881 op/s
2015-05-15 00:54:41.512540 mon.0 [INF] pgmap v26048654: 17344 pgs: 
17344 active+clean; 6296 GB data, 19597 GB used, 155 TB / 174 TB avail; 
1221 kB/s rd, 2385 kB/s wr, 1686 op/s


Am I right to assume the values for kB/s rd and kB/s wr mean that 
the indicated amount of data has been read/written by clients since the 
last line, total over all OSDs?


As for the op/s I am a little more uncertain. What kind of operations 
does this count?

Assuming it is also reads and writes aggregated, what counts as an operation?
For example, when I request data via the Rados Gateway, do I see one 
op here for the request from RGW's perspective, or do I see multiple, 
depending on how many low level objects a big RGW upload was striped 
to?
What about non-rgw objects that get striped? Are reads/writes on those 
counted as one or one per stripe?
Is there anything else counting into this but reads/writes to the 
object data? What about key/value level accesses?


Is it possible to someone come up with a theoretical estimate for a 
maximum value achievable with a given set of hardware?

This is a cluster of 4 nodes with 48 OSDs, 4TB each, all spinners.
Are these values good, bad, critical?

Can I somehow deduce - even if it is just a rather rough estimate - how 
loaded my cluster is? I am not talking about precision monitoring, 
but some kind of traffic light system (e.g. up to X% of the theoretical 
max is fine, up to Y% show a very busy cluster and anything above Y% 
means we might be up for trouble)?


Any pointers to documentation or other material would be appreciated if 
this was discussed in some detail before. The only thing I found was a 
post on this list from 2013 which did not say more than ops are reads, 
writes, anything, not going into detail about the anything.


Thanks a lot!

Daniel


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rados cppool

2015-05-14 Thread Daniel Schneller

On 2015-04-23 19:39:33 +, Sage Weil said:


On Thu, 23 Apr 2015, Pavel V. Kaygorodov wrote:

Hi!

I have copied two of my pools recently, because old ones has too many pgs.
Both of them contains RBD images, with 1GB and ~30GB of data.
Both pools was copied without errors, RBD images are mountable and 
seems to be fine.

CEPH version is 0.94.1


You will likely have problems if you try to delete snapshots that existed
on the images (snaps are not copied/preserved by cppool).

sage


Could you be more specific on what these problems would look like? Are 
you referring to RBD pools in particular, or is this a general issue 
with snapshots? Anything that could be done to prevent these issues?


Background of the question is that we take daily snapshots of some 
pools to allow reverting data when users make mistakes (via RGW). So it 
would be difficult to get rid of all snapshots first.


Thanks
Daniel


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Firefly to Hammer

2015-05-14 Thread Daniel Schneller
You should be able to do just that. We recently upgraded from Firefly 
to Hammer like that. Follow the order described on the website. 
Monitors, OSDs, MDSs.


Notice that the Debian packages do not restart running daemons, but 
they _do_ start up not running ones. So say for some reason before your 
upgrade you shut down OSDs, they would be started as part of the 
upgrade.


Daniel


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy osd activate ERROR

2015-05-14 Thread 张忠波
Hi ,
I encountered other problems when i installed ceph .
#1. When i run the command ,   ceph-deploy new ceph-0, and got
the  ceph.conf
  file . However , there is not any information aboutosd pool default
size or public network .
[root@ceph-2 my-cluster]# more ceph.conf
[global]
auth_service_required = cephx
filestore_xattr_use_omap = true
auth_client_required = cephx
auth_cluster_required = cephx
mon_host = 192.168.72.33
mon_initial_members = ceph-0
fsid = 74d682b5-2bf2-464c-8462-740f96bcc525

#2.  I ignore the problem #1 , and continue to  set us the Ceph Storage
Cluster , encountered a error  , whhen run the command  ' ceph-deploy osd
activate  ceph-2:/mnt/sda ' .
I do it refer to the manual ,
http://ceph.com/docs/master/start/quick-ceph-deploy/
error message
[root@ceph-0 my-cluster]#ceph-deploy osd prepare ceph-2:/mnt/sda
[ceph_deploy.conf][DEBUG ] found configuration file at:
/root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.23): /usr/bin/ceph-deploy osd
prepare ceph-2:/mnt/sda
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks ceph-2:/mnt/sda:
[ceph-2][DEBUG ] connected to host: ceph-2
[ceph-2][DEBUG ] detect platform information from remote host
[ceph-2][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: CentOS 6.5 Final
[ceph_deploy.osd][DEBUG ] Deploying osd to ceph-2
[ceph-2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph-2][INFO  ] Running command: udevadm trigger --subsystem-match=block
--action=add
[ceph_deploy.osd][DEBUG ] Preparing host ceph-2 disk /mnt/sda journal None
activate False
[ceph-2][INFO  ] Running command: ceph-disk -v prepare --fs-type xfs
--cluster ceph -- /mnt/sda
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd
--cluster=ceph --show-config-value=fsid
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf
--cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf
--cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf
--cluster=ceph --name=osd. --lookup osd_mount_options_xfs
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf
--cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd
--cluster=ceph --show-config-value=osd_journal_size
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf
--cluster=ceph --name=osd. --lookup osd_cryptsetup_parameters
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf
--cluster=ceph --name=osd. --lookup osd_dmcrypt_key_size
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf
--cluster=ceph --name=osd. --lookup osd_dmcrypt_type
[ceph-2][WARNIN] DEBUG:ceph-disk:Preparing osd data dir /mnt/sda
[ceph-2][INFO  ] checking OSD status...
[ceph-2][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
[ceph_deploy.osd][DEBUG ] Host ceph-2 is now ready for osd use.
Error in sys.exitfunc:
[root@ceph-0 my-cluster]# ceph-deploy osd activate  ceph-2:/mnt/sda
[ceph_deploy.conf][DEBUG ] found configuration file at:
/root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.23): /usr/bin/ceph-deploy osd
activate ceph-2:/mnt/sda
[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks ceph-2:/mnt/sda:
[ceph-2][DEBUG ] connected to host: ceph-2
[ceph-2][DEBUG ] detect platform information from remote host
[ceph-2][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: CentOS 6.5 Final
[ceph_deploy.osd][DEBUG ] activating host ceph-2 disk /mnt/sda
[ceph_deploy.osd][DEBUG ] will use init type: sysvinit
[ceph-2][INFO  ] Running command: ceph-disk -v activate --mark-init
sysvinit --mount /mnt/sda
[ceph-2][WARNIN] DEBUG:ceph-disk:Cluster uuid is
af23707d-325f-4846-bba9-b88ec953be80
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd
--cluster=ceph --show-config-value=fsid
[ceph-2][WARNIN] DEBUG:ceph-disk:Cluster name is ceph
[ceph-2][WARNIN] DEBUG:ceph-disk:OSD uuid is
ca9f6649-b4b8-46ce-a860-1d81eed4fd5e
[ceph-2][WARNIN] DEBUG:ceph-disk:Allocating OSD id...
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph --cluster
ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/
ceph.keyring osd create --concise
ca9f6649-b4b8-46ce-a860-1d81eed4fd5e
[ceph-2][WARNIN] 2015-05-14 17:37:10.988914 7f373bd34700  0 librados:
client.bootstrap-osd authentication error (1) Operation not permitted
[ceph-2][WARNIN] Error connecting to cluster: PermissionError
[ceph-2][WARNIN] ceph-disk: Error: ceph osd create failed: Command
'/usr/bin/ceph' returned non-zero exit status 1:
[ceph-2][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph-disk -v
activate --mark-init sysvinit --mount /mnt/sda

Error in sys.exitfunc:

I look forward to hearing from you soon.

Re: [ceph-users] How to debug a ceph read performance problem?

2015-05-14 Thread changqian zuo
Hi,

1. The network problem has been partly resovled, we removed bonding of Juno
node (Ceph client side), and now IO comes back:

[root@controller fio-rbd]# rados bench -p test 30 seq
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
 0   0 0 0 0 0 - 0
 1  16   176   160 639.7   640  0.186673 0.0933836
 2  16   339   323   645.795   652  0.079945 0.0965533
 3  16   509   493   657.153   680   0.06882 0.0957288
 4  16   672   656   655.837   652  0.068071 0.0963944
 5  16   828   812649.45   624  0.061999 0.0975488
 6  16   989   973   648.513   644  0.110632 0.0979637
 7  16  1139  1123   641.565   600  0.078144 0.0983299
 8  16  1295  1279   639.349   624  0.243684 0.0991592
 9  16  1453  1437   638.522   632   0.08775 0.0993148
10  16  1580  1564   625.461   508  0.061375  0.101921

The bonding is constructed by interface em1 and em2, the problematic
interface is em2. Traffic from some stroage nodes to em2 are quite well,
but some are not good. Still don't know the exact issue at the moment, but
it is surely a network problem.

2. About monitors:
- monitors has not been restarted for at least half a year.
- ceph tell mon.bj-ceph14 compact just stuck until I Ctrl+C, the same for
other monitor nodes.
- /var/lib/ceph/mon share Linux system disk (RAID1 of two HDD)

I will go through google and mail-list later.

3. About memory, yes, I made things wrong. Will spend some time on atop
:-)

4. Single CPU with 4 core, without hyperthread.

So, CPU need to be upgraded, and OSD number per Ceph node should be reduced
(spare some CPU power for more SSD), and add more SSD journal disk. Also, I
am planning to upgrade OSD data disk from 1TB to 4TB. I will look through
mails about ratio between OSD data disk and journal disk, and space and
performance requirement for journal SSD.


2015-05-14 10:39 GMT+08:00 Christian Balzer ch...@gol.com:


 Hello,

 On Thu, 14 May 2015 09:36:14 +0800 changqian zuo wrote:

  1. No packet drop found in system log.
 
 Is that storage node with the bad network fixed?

  2. ceph health detail shows:
 
  # ceph health detail
  HEALTH_WARN
  mon.bj-ceph10 addr 10.10.11.23:6789/0 has 43% avail disk space -- store
  is getting too big! 77364 MB = 40960 MB
  mon.bj-ceph12 addr 10.10.11.25:6789/0 has 43% avail disk space -- store
  is getting too big! 77071 MB = 40960 MB
  mon.bj-ceph13 addr 10.10.11.26:6789/0 has 42% avail disk space -- store
  is getting too big! 78403 MB = 40960 MB
  mon.bj-ceph14 addr 10.10.11.27:6789/0 has 43% avail disk space -- store
  is getting too big! 78006 MB = 40960 MB
 
  I am checking out what does this mean exactly.
 
 You will find a lot of answers looking for compact mon storage, including
 a very recent thread here.
 In short, I suppose those monitors have not been re-started for a long
 time, right?
 Also, you have a pretty big cluster, so this isn't all that surprising.

 I'd suggest to do a ceph tell mon.mon.bj-ceph14 compact and if that
 works out well, repeat with the others.
 Are your MONs using SSDs, for /var/lib/ceph in particular?

  3. by run out, I mean:
 
  # free -m
   total   used   free sharedbuffers cached
  Mem: 64376  63766609  0123  47974
  -/+ buffers/cache:  15669  48707
  Swap:22831   2319  20512
 
 That doesn't look too bad, only 16MB used by processes, the rest is cache
 and friends. However during recovery this usage will get higher and Ceph
 also benefits from large pagecaches for hot object reads.
 So doubling the memory might be a good long term goal.

  top shows memory are mainly used by ceph-osd process.
 You might want to spent some time learning atop, as it will show you
 what is going on in every part of your system (huge terminal window helps).

 
  4. Cluster configuration, for a single Ceph node
 
  CPU: Intel(R) Xeon(R) CPU E5-2603 v2 @ 1.80GHz
 Single CPU???
 That's optimistically 8GHz of CPU, when the recommendation for purely HDD
 based OSDs is 1GHz per OSD. Since you're using SSD journals, you will want
 double that to not be CPU limited for many use cases.

  memory: 64GB
  Data disk: 1TB HDD * 24 (do not know vendor now)
  Journal disk: 800TB SSD * 2 (do not know vendor now)
 
  We have run 24 OSDs on one node! I think this is why the memory in
  shortage (also CPU may not afford in highload recovery or reblance, and
  2 SSD for 24 OSD journal is just not enough) and slow OSD write logged,
  if reduced to 16 or 12, it would be much better.
 
 You're likely to be CPU bound a lot of times during normal operations
 (provided everything else in your cluster is working correctly).
 Depending on the type of SSD they should be fast enough to 

Re: [ceph-users] Find out the location of OSD Journal

2015-05-14 Thread Josef Johansson
I tend to use something along the lines

for osd in $(grep osd /etc/mtab | cut -d ' ' -f 2); do echo $(echo $osd | cut 
-d '-' -f 2): $(readlink -f $(readlink $osd/journal));done | sort -k 2

Cheers,
Josef
 
 On 08 May 2015, at 02:47, Robert LeBlanc rob...@leblancnet.us wrote:
 
 You may also be able to use `ceph-disk list`.
 
 On Thu, May 7, 2015 at 3:56 AM, Francois Lafont flafdiv...@free.fr 
 mailto:flafdiv...@free.fr wrote:
 Hi,
 
 Patrik Plank wrote:
 
  i cant remember on which drive I install which OSD journal :-||
  Is there any command to show this?
 
 It's probably not the answer you hope, but why don't use a simple:
 
 ls -l /var/lib/ceph/osd/ceph-$id/journal
 
 ?
 
 --
 François Lafont
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com