Re: [ceph-users] Dataflow/path Client --- OSD
On 05/07/2015 10:28 AM, Götz Reinicke - IT Koordinator wrote: Hi, still designing and deciding, we asked ourself: How dose the data travels from and to an OSD? E.G. I have my Fileserver with a rbd mounted and a client workstation writes/read to/from a share on that rbd. Is the data directly going to an OSD (node) or is it e.g. travelling trough the monitors as well. No, the monitors are never in the I/O path. Clients talk to the OSDs directly. 10Gb would be sufficient, but I think for all the nodes. Bandwidth is usually not the problem, latency is. The point is: If we connect our file servers and OSD nodes with 40Gb, dose the monitor need 40Gb to? Or would be 10Gb enough. Oversize is ok :) ... Thansk and regards . Götz ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Find out the location of OSD Journal
Hi, Inside your mounted osd there is a symlink - journal - pointing to a file or disk/partition used with it. Cheers, Martin On Thu, May 7, 2015 at 11:06 AM, Patrik Plank pat...@plank.me wrote: Hi, i cant remember on which drive I install which OSD journal :-|| Is there any command to show this? thanks regards ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] wrong diff-export format description
Hi all, It looks like a bit wrong description http://ceph.com/docs/master/dev/rbd-diff/ - u8: ‘s’ - u64: (ending) image size I suppose that instead of u64 should be used something like le64, isn't it? Because of from this description is not clear which bytes order should I use.. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Networking question
Hi I have theoretical question about network in ceph. If I have two networks (public and cluster network) and one link in public network is broken ( cluster network is fine) what I will see in my cluster ? How work ceph in this situation ? Or how works ceph if link to cluster network was broken and public network is fine ? CEPH node will available ? --- rgawron ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Btrfs defragmentation
On 05/06/15 19:51, Lionel Bouton wrote: During normal operation Btrfs OSD volumes continue to behave in the same way XFS ones do on the same system (sometimes faster/sometimes slower). What is really slow though it the OSD process startup. I've yet to make serious tests (umounting the filesystems to clear caches), but I've already seen 3 minutes of delay reading the pgs. Example : 2015-05-05 16:01:24.854504 7f57c518b780 0 osd.17 22428 load_pgs 2015-05-05 16:01:24.936111 7f57ae7fc700 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-17) destroy_checkpoint: ioctl SNAP_DESTROY got (2) No such file or directory 2015-05-05 16:01:24.936137 7f57ae7fc700 -1 filestore(/var/lib/ceph/osd/ceph-17) unable to destroy snap 'snap_1671188' got (2) No such file or directory 2015-05-05 16:01:24.991629 7f57ae7fc700 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-17) destroy_checkpoint: ioctl SNAP_DESTROY got (2) No such file or directory 2015-05-05 16:01:24.991654 7f57ae7fc700 -1 filestore(/var/lib/ceph/osd/ceph-17) unable to destroy snap 'snap_1671189' got (2) No such file or directory 2015-05-05 16:04:25.413110 7f57c518b780 0 osd.17 22428 load_pgs opened 160 pgs The filesystem might not have reached its balance between fragmentation and defragmentation rate at this time (so this may change) but mirrors our initial experience with Btrfs where this was the first symptom of bad performance. We've seen progress on this front. Unfortunately for us we had 2 power outages and they seem to have damaged the disk controller of the system we are testing Btrfs on: we just had a system crash. On the positive side this gives us an update on the OSD boot time. With a freshly booted system without anything in cache : - the first Btrfs OSD we installed loaded the pgs in ~1mn30s which is half of the previous time, - the second Btrfs OSD where defragmentation was disabled for some time and was considered more fragmented by our tool took nearly 10 minutes to load its pgs (and even spent 1mn before starting to load them). - the third Btrfs OSD which was always defragmented took 4mn30 seconds to load its pgs (it was considered more fragmented than the first and less than the second). My current assumption is that the defragmentation process we use can't handle large spikes of writes (at least when originally populating the OSD with data through backfills) but then can repair the damage on performance they cause at least partially (it's still slower to boot than the 3 XFS OSDs on the same system where loading pgs took 6-9 seconds). In the current setup the defragmentation is very slow to process because I set it up to generate very little load on the filesystems it processes : there may be room to improve. Best regards, Lionel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Dataflow/path Client --- OSD
Hi, still designing and deciding, we asked ourself: How dose the data travels from and to an OSD? E.G. I have my Fileserver with a rbd mounted and a client workstation writes/read to/from a share on that rbd. Is the data directly going to an OSD (node) or is it e.g. travelling trough the monitors as well. The point is: If we connect our file servers and OSD nodes with 40Gb, dose the monitor need 40Gb to? Or would be 10Gb enough. Oversize is ok :) ... Thansk and regards . Götz -- Götz Reinicke IT-Koordinator Tel. +49 7141 969 82 420 E-Mail goetz.reini...@filmakademie.de Filmakademie Baden-Württemberg GmbH Akademiehof 10 71638 Ludwigsburg www.filmakademie.de Eintragung Amtsgericht Stuttgart HRB 205016 Vorsitzender des Aufsichtsrats: Jürgen Walter MdL Staatssekretär im Ministerium für Wissenschaft, Forschung und Kunst Baden-Württemberg Geschäftsführer: Prof. Thomas Schadt smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rados Gateway and keystone
On 07/05/15 20:21, ghislain.cheval...@orange.com wrote: HI all, After adding the nss and the keystone admin url parameters in ceph.conf and creating the openSSL certificates, all is working well. If I had followed the doc and processed by copy/paste, I wouldn't have encountered any problems. As all is working well without this set of parameters using the swift API and keystone, It would be helpful if the page http://ceph.com/docs/master/radosgw/keystone/ was more precise according to this implementation. Best regards -Message d'origine- De : CHEVALIER Ghislain IMT/OLPS Envoyé : lundi 13 avril 2015 16:17 À : ceph-users Objet : RE: [ceph-users] Rados Gateway and keystone Hi all, Coming back to that issue. I successfully used keystone users for the rados gateway and the swift API but I still don't understand how it can work with S3 API and i.e. S3 users (AccessKey/SecretKey) I found a swift3 initiative but I think It's only compliant in a pure OpenStack swift environment by setting up a specific plug-in. https://github.com/stackforge/swift3 A rgw can be, at the same, time under keystone control and standard radosgw-admin if - for swift, you use the right authentication service (keystone or internal) - for S3, you use the internal authentication service So, my questions are still valid. How can a rgw work for S3 users if there are stored in keystone? Which is the accesskey and secretkey? What is the purpose of rgw s3 auth use keystone parameter ? The difference is that (in particular with the v2 protocol) swift clients talk to keystone to a) authenticate and b) find the swift storage endpoint (even if it is actually pointing to rgw). In contrast s3 clients will talk directly to the rgw, and *it* will talk to kesystone to check the client's s3 credentials fir them. That's why rgw need to have rgw s3 auth use keystone and similar parameters. Cheers Mark ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Find out the location of OSD Journal
Hi, Patrik Plank wrote: i cant remember on which drive I install which OSD journal :-|| Is there any command to show this? It's probably not the answer you hope, but why don't use a simple: ls -l /var/lib/ceph/osd/ceph-$id/journal ? -- François Lafont ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Find out the location of OSD Journal
Hi, i cant remember on which drive I install which OSD journal :-|| Is there any command to show this? thanks regards ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] After calamari installation osd start failed
Hi, after i have installed calamari, ceph shows me following error when i change/reinstall add a osd.0. Traceback (most recent call last): File /usr/bin/calamari-crush-location, line 86, in module sys.exit(main()) File /usr/bin/calamari-crush-location, line 83, in main print get_osd_location(args.id) File /usr/bin/calamari-crush-location, line 47, in get_osd_location last_location = get_last_crush_location(osd_id) File /usr/bin/calamari-crush-location, line 27, in get_last_crush_location proc = Popen(c, stdout=PIPE, stderr=PIPE) File /usr/lib/python2.7/subprocess.py, line 679, in __init__ errread, errwrite) File /usr/lib/python2.7/subprocess.py, line 1259, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory Invalid command: saw 0 of args(string(goodchars [A-Za-z0-9-_.=])) [string(goodchars [A-Za-z0-9-_.=])...], expected at least 1 osd crush create-or-move osdname (id|osd.id) float[0.0-] args [args...] : create entry or move existing entry for name weight at/to location args Error EINVAL: invalid command failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.0 --keyring=/var/lib/ceph/osd/ceph-0/keyring osd crush create-or-move -- 0 0.46 ' [global] osd_crush_location_hook = /usr/bin/calamari-crush-location fsid = 78227661-3a1b-4e56-addc-c2a272933ac2 mon_initial_members = ceph01 mon_host = 10.0.0.20,10.0.0.21,10.0.0.22 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true filestore_op_threads = 32 public_network = 10.0.0.0/24 cluster_network = 10.0.1.0/24 osd_pool_default_size = 3 osd_pool_default_min_size = 1 osd_pool_default_pg_num = 4096 osd_pool_default_pgp_num = 4096 osd_max_write_size = 200 osd_map_cache_size = 1024 osd_map_cache_bl_size = 128 osd_recovery_op_priority = 1 osd_max_recovery_max_active = 1 osd_recovery_max_backfills = 1 osd_op_threads = 32 osd_disk_threads = 8 After i have recreate osd.0 3 0.27 osd.3 up 1 6 0.55 osd.6 up 1 9 0.55 osd.9 up 1 12 0.27 osd.12 up 1 15 0.27 osd.15 up 1 18 0.27 osd.18 up 1 21 0.06999 osd.21 up 1 24 0.27 osd.24 up 1 27 0.27 osd.27 up 1 -3 3.18 host ceph02 4 0.55 osd.4 up 1 7 0.55 osd.7 up 1 10 0.55 osd.10 up 1 13 0.27 osd.13 up 1 1 0.11 osd.1 up 1 16 0.27 osd.16 up 1 19 0.27 osd.19 up 1 22 0.06999 osd.22 up 1 25 0.27 osd.25 up 1 28 0.27 osd.28 up 1 -4 2.76 host ceph03 2 0.11 osd.2 up 1 5 0.55 osd.5 up 1 8 0.55 osd.8 up 1 11 0.13 osd.11 up 1 14 0.27 osd.14 up 1 17 0.27 osd.17 up 1 20 0.27 osd.20 up 1 23 0.06999 osd.23 up 1 26 0.27 osd.26 up 1 29 0.27 osd.29 up 1 0 0 osd.0 down 0 Does anybody have an idea how can i solve this?? thanks cheers ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph_argparse packaging error in Hammer/debian?
Hi, https://github.com/ceph/ceph/pull/4517 is the fix for http://tracker.ceph.com/issues/11388 Cheers On 07/05/2015 20:28, Andy Allan wrote: Hi all, I've found what I think is a packaging error in Hammer. I've tried registering for the tracker.ceph.com site but my confirmation email has got lost somewhere! /usr/bin/ceph is installed by the ceph-common package. ``` dpkg -S /usr/bin/ceph ceph-common: /usr/bin/ceph ``` It relies on ceph_argparse, but that isn't packaged in ceph-common, it's packaged in ceph. But the dependency is that ceph relies on ceph-common, not the other way around. ``` dpkg -S /usr/lib/python2.7/dist-packages/ceph_argparse.py ceph: /usr/lib/python2.7/dist-packages/ceph_argparse.py ``` Moreover, there's a commit that says move argparse to ceph-common but it's actually moved it to the `ceph.install` file, not `ceph-common.install` https://github.com/ceph/ceph/commit/2a23eac54957e596d99985bb9e187a668251a9ec So I think this is a packaging error, unless I'm misunderstanding something! Thanks, Andy ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rbd unmap command hangs when there is no network connection with mons and osds
Hi, when issuing rbd unmap command when there is no network connection with mons and osds, the command hangs. Isn't there a option to force unmap even on this situation? Att. Vandeir. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd unmap command hangs when there is no network connection with mons and osds
On Thu, May 7, 2015 at 10:20 PM, Vandeir Eduardo vandeir.edua...@gmail.com wrote: Hi, when issuing rbd unmap command when there is no network connection with mons and osds, the command hangs. Isn't there a option to force unmap even on this situation? No, but you can Ctrl-C the unmap command and that should do it. In the dmesg you'll see something like rbd: unable to tear down watch request and you may have to wait for the cluster to timeout the watch. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph_argparse packaging error in Hammer/debian?
Hi Loic, Sorry for the noise! I'd looked when I first ran into it and didn't find any reports or PRs, I should have checked again today. Thanks, Andy On 7 May 2015 at 19:41, Loic Dachary l...@dachary.org wrote: Hi, https://github.com/ceph/ceph/pull/4517 is the fix for http://tracker.ceph.com/issues/11388 Cheers On 07/05/2015 20:28, Andy Allan wrote: Hi all, I've found what I think is a packaging error in Hammer. I've tried registering for the tracker.ceph.com site but my confirmation email has got lost somewhere! /usr/bin/ceph is installed by the ceph-common package. ``` dpkg -S /usr/bin/ceph ceph-common: /usr/bin/ceph ``` It relies on ceph_argparse, but that isn't packaged in ceph-common, it's packaged in ceph. But the dependency is that ceph relies on ceph-common, not the other way around. ``` dpkg -S /usr/lib/python2.7/dist-packages/ceph_argparse.py ceph: /usr/lib/python2.7/dist-packages/ceph_argparse.py ``` Moreover, there's a commit that says move argparse to ceph-common but it's actually moved it to the `ceph.install` file, not `ceph-common.install` https://github.com/ceph/ceph/commit/2a23eac54957e596d99985bb9e187a668251a9ec So I think this is a packaging error, unless I'm misunderstanding something! Thanks, Andy ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph_argparse packaging error in Hammer/debian?
Hi all, I've found what I think is a packaging error in Hammer. I've tried registering for the tracker.ceph.com site but my confirmation email has got lost somewhere! /usr/bin/ceph is installed by the ceph-common package. ``` dpkg -S /usr/bin/ceph ceph-common: /usr/bin/ceph ``` It relies on ceph_argparse, but that isn't packaged in ceph-common, it's packaged in ceph. But the dependency is that ceph relies on ceph-common, not the other way around. ``` dpkg -S /usr/lib/python2.7/dist-packages/ceph_argparse.py ceph: /usr/lib/python2.7/dist-packages/ceph_argparse.py ``` Moreover, there's a commit that says move argparse to ceph-common but it's actually moved it to the `ceph.install` file, not `ceph-common.install` https://github.com/ceph/ceph/commit/2a23eac54957e596d99985bb9e187a668251a9ec So I think this is a packaging error, unless I'm misunderstanding something! Thanks, Andy ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] unable to start monitor
Srikanth, Try if this helps.. sudo initctl list|grep ceph (should display all ceph daemon) sudo start ceps-mon-all (To start ceph all ceph-monitor) Thanks -Krishna On May 7, 2015, at 1:35 PM, Srikanth Madugundi srikanth.madugu...@gmail.com wrote: Hi, I am setting up a local instance of ceph cluster with latest source from git hub. The build succeeded and installation was successful, But I could not start the monitor. The ceph start command returns immediately and does not output anything. $ sudo /etc/init.d/ceph start mon.monitor1 $ $ ls -l /var/lib/ceph/mon/ceph-monitor1/ total 8 -rw-r--r-- 1 root root0 May 7 20:27 done -rw-r--r-- 1 root root 77 May 7 19:12 keyring drwxr-xr-x 2 root root 4096 May 7 19:12 store.db -rw-r--r-- 1 root root0 May 7 20:26 sysvinit -rw-r--r-- 1 root root0 May 7 20:09 upstart The log filed does not seem to have any details either $ cat /var/log/ceph/ceph-mon.monitor1.log 2015-05-07 19:12:13.356389 7f3f06bdb880 -1 did not load config file, using default settings. $ cat /etc/ceph/ceph.conf [global] mon host = 15.43.33.21 fsid = 92f859df-8b27-466a-8d44-01af2b7ea7e6 mon initial members = monitor1 # Enable authentication auth cluster required = cephx auth service required = cephx auth client required = cephx # POOL / PG / CRUSH osd pool default size = 3 # Write an object 3 times osd pool default min size = 1 # Allow writing one copy in a degraded state # Ensure you have a realistic number of placement groups. We recommend # approximately 200 per OSD. E.g., total number of OSDs multiplied by 200 # divided by the number of replicas (i.e., osd pool default size). # !! BE CAREFULL !! # You properly should never rely on the default numbers when creating pool! osd pool default pg num = 32 osd pool default pgp num = 32 #log file = /home/y/logs/ceph/$cluster-$type.$id.log # Logging debug paxos = 0 debug throttle = 0 keyring = /etc/ceph/ceph.client.admin.keyring #run dir = /home/y/var/run/ceph [mon] debug mon = 10 debug ms = 1 # We found that when the disk usage reach to 94%, the disk could not be written # any file (no free space), so that we lower the full ratio and we should start # data migration before it becomes full mon osd full ratio = 0.9 #mon data = /home/y/var/lib/ceph/mon/$cluster-$id mon osd down out interval = 172800 # 2 * 24 * 60 * 60 seconds # Ceph monitors need to be told how many reporters must to be seen from different # OSDs before it can be marked offline, this should be greater than the number of # OSDs per OSD host mon osd min down reporters = 12 #keyring = /home/y/conf/ceph/ceph.mon.keyring ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to backup hundreds or thousands of TB
On Thu, May 7, 2015 at 5:20 AM, Wido den Hollander w...@42on.com wrote: Aren't snapshots something that should protect you against removal? IF snapshots work properly in CephFS you could create a snapshot every hour. Unless the file is created and removed between snapshots, then the Recycle Bin feature would have it and the snapshot wouldn't. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW - Can't download complete object
- Original Message - From: Sean seapasu...@uchicago.edu To: ceph-users@lists.ceph.com Sent: Thursday, May 7, 2015 3:35:14 PM Subject: [ceph-users] RGW - Can't download complete object I have another thread goign on about truncation of objects and I believe this is a separate but equally bad issue in civetweb/radosgw. My cluster is completely healthy I have one (possibly more) objects stored in ceph rados gateway that will return a different size every time I Try to download it:: http://pastebin.com/hK1iqXZH --- ceph -s http://pastebin.com/brmxQRu3 --- radosgw-admin object stat of the object The two interesting things that I see here is: - the multipart upload size for each part is on the big side (is it 1GB for each part?) - it seems that there are a lot of parts that suffered from retries, could be a source for the 512k issue http://pastebin.com/5TnvgMrX --- python download code The weird part is every time I download the file it is of a different size. I am grabbing the individual objects of the 14g file and will update this email once I have them all statted out. Currently I am getting, on average, 1.5G to 2Gb files when the total object should be 14G in size. lacadmin@kh10-9:~$ python corruptpull.py the download failed. The filesize = 2125988202. The actual size is 14577056082. Attempts = 1 the download failed. The filesize = 2071462250. The actual size is 14577056082. Attempts = 2 the download failed. The filesize = 2016936298. The actual size is 14577056082. Attempts = 3 the download failed. The filesize = 1643643242. The actual size is 14577056082. Attempts = 4 the download failed. The filesize = 1597505898. The actual size is 14577056082. Attempts = 5 the download failed. The filesize = 2075656554. The actual size is 14577056082. Attempts = 6 the download failed. The filesize = 650117482. The actual size is 14577056082. Attempts = 7 the download failed. The filesize = 1987576170. The actual size is 14577056082. Attempts = 8 the download failed. The filesize = 2109210986. The actual size is 14577056082. Attempts = 9 the download failed. The filesize = 2142765418. The actual size is 14577056082. Attempts = 10 the download failed. The filesize = 2134376810. The actual size is 14577056082. Attempts = 11 the download failed. The filesize = 2146959722. The actual size is 14577056082. Attempts = 12 the download failed. The filesize = 2142765418. The actual size is 14577056082. Attempts = 13 the download failed. The filesize = 1467482474. The actual size is 14577056082. Attempts = 14 the download failed. The filesize = 2046296426. The actual size is 14577056082. Attempts = 15 the download failed. The filesize = 2021130602. The actual size is 14577056082. Attempts = 16 the download failed. The filesize = 177366. The actual size is 14577056082. Attempts = 17 the download failed. The filesize = 2146959722. The actual size is 14577056082. Attempts = 18 the download failed. The filesize = 2016936298. The actual size is 14577056082. Attempts = 19 the download failed. The filesize = 1983381866. The actual size is 14577056082. Attempts = 20 the download failed. The filesize = 2134376810. The actual size is 14577056082. Attempts = 21 Notice it is always different. Once the rados -p .rgw.buckets ls | grep finishes I will return the listing of objects as well but this is quite odd and I think this is a separate issue. Has anyone seen this before? Why wouldn't radosgw return an error and why am I getting different file sizes? Usually that means that there was some error in the middle of the download, maybe client to radosgw communication issue. What does the radosgw show when this happens? I would post the log from radosgw but I don't see any err|wrn|fatal mentions in the log and the client completes without issue every time. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Find out the location of OSD Journal
You may also be able to use `ceph-disk list`. On Thu, May 7, 2015 at 3:56 AM, Francois Lafont flafdiv...@free.fr wrote: Hi, Patrik Plank wrote: i cant remember on which drive I install which OSD journal :-|| Is there any command to show this? It's probably not the answer you hope, but why don't use a simple: ls -l /var/lib/ceph/osd/ceph-$id/journal ? -- François Lafont ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph_argparse packaging error in Hammer/debian?
On 05/07/2015 12:53 PM, Andy Allan wrote: Hi Loic, Sorry for the noise! I'd looked when I first ran into it and didn't find any reports or PRs, I should have checked again today. Thanks, Andy That's totally fine. If you want, you can review that PR and give a thumbs up or down comment there :) More eyes on the Debian-related changes are always a good thing. - Ken ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rados Gateway and keystone
HI all, After adding the nss and the keystone admin url parameters in ceph.conf and creating the openSSL certificates, all is working well. If I had followed the doc and processed by copy/paste, I wouldn't have encountered any problems. As all is working well without this set of parameters using the swift API and keystone, It would be helpful if the page http://ceph.com/docs/master/radosgw/keystone/ was more precise according to this implementation. Best regards -Message d'origine- De : CHEVALIER Ghislain IMT/OLPS Envoyé : lundi 13 avril 2015 16:17 À : ceph-users Objet : RE: [ceph-users] Rados Gateway and keystone Hi all, Coming back to that issue. I successfully used keystone users for the rados gateway and the swift API but I still don't understand how it can work with S3 API and i.e. S3 users (AccessKey/SecretKey) I found a swift3 initiative but I think It's only compliant in a pure OpenStack swift environment by setting up a specific plug-in. https://github.com/stackforge/swift3 A rgw can be, at the same, time under keystone control and standard radosgw-admin if - for swift, you use the right authentication service (keystone or internal) - for S3, you use the internal authentication service So, my questions are still valid. How can a rgw work for S3 users if there are stored in keystone? Which is the accesskey and secretkey? What is the purpose of rgw s3 auth use keystone parameter ? Best regards -- De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de ghislain.cheval...@orange.com Envoyé : lundi 23 mars 2015 14:03 À : ceph-users Objet : [ceph-users] Rados Gateway and keystone Hi All, I just would to be sure about keystone configuration for Rados Gateway. I read the documentation http://ceph.com/docs/master/radosgw/keystone/ and http://ceph.com/docs/master/radosgw/config-ref/?highlight=keystone but I didn't catch if after having configured the rados gateway (ceph.conf) in order to use keystone, it becomes mandatory to create all the users in it. In other words, can a rgw be, at the same, time under keystone control and standard radosgw-admin ? How does it work for S3 users ? What is the purpose of rgw s3 auth use keystone parameter ? Best regards - - - - - - - - - - - - - - - - - Ghislain Chevalier +33299124432 +33788624370 ghislain.cheval...@orange.com _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to backup hundreds or thousands of TB
On 06/05/2015 16:58, Scottix wrote: As a point to * someone accidentally removed a thing, and now they need a thing back I thought MooseFS has an interesting feature that I thought would be good for CephFS and maybe others. Basically a timed Trashbin Deleted files are retained for a configurable period of time (a file system level trash bin) It's an idea to cover this use case. Until recently we had a bug where deleted files weren't purged until the next MDS restart, so maybe we should just back out the fix for that :-D Seriously though, I didn't know about that MooseFS feature, it's interesting that they decided to implement that. It would be fairly straightforward to do that in CephFS (we already put deleted files into a 'stray' directory before purging them asynchronously), but I think there might be some debate about whether it's really the role of the underlying filesystem to do that kind of thing. John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Networking question
Hi, If I have two networks (public and cluster network) and one link in public network is broken ( cluster network is fine) what I will see in my cluster ? See http://ceph.com/docs/master/rados/configuration/network-config-ref/ only osd between them use private network. so if public network not work (on osd), mon will not see osd, so osd will be out. Or how works ceph if link to cluster network was broken and public network is fine ? I'm not sure here, osd will be enable to replicate, so maybe it going out itself ? - Mail original - De: MEGATEL / Rafał Gawron rafal.gaw...@megatel.com.pl À: ceph-users ceph-users@lists.ceph.com Envoyé: Jeudi 7 Mai 2015 10:11:20 Objet: [ceph-users] Networking question Hi I have theoretical question about network in ceph. If I have two networks (public and cluster network) and one link in public network is broken ( cluster network is fine) what I will see in my cluster ? How work ceph in this situation ? Or how works ceph if link to cluster network was broken and public network is fine ? CEPH node will available ? --- rgawron ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Btrfs defragmentation
Hi, On 05/07/2015 12:04 PM, Lionel Bouton wrote: On 05/06/15 19:51, Lionel Bouton wrote: *snipsnap* We've seen progress on this front. Unfortunately for us we had 2 power outages and they seem to have damaged the disk controller of the system we are testing Btrfs on: we just had a system crash. On the positive side this gives us an update on the OSD boot time. With a freshly booted system without anything in cache : - the first Btrfs OSD we installed loaded the pgs in ~1mn30s which is half of the previous time, - the second Btrfs OSD where defragmentation was disabled for some time and was considered more fragmented by our tool took nearly 10 minutes to load its pgs (and even spent 1mn before starting to load them). - the third Btrfs OSD which was always defragmented took 4mn30 seconds to load its pgs (it was considered more fragmented than the first and less than the second). My current assumption is that the defragmentation process we use can't handle large spikes of writes (at least when originally populating the OSD with data through backfills) but then can repair the damage on performance they cause at least partially (it's still slower to boot than the 3 XFS OSDs on the same system where loading pgs took 6-9 seconds). In the current setup the defragmentation is very slow to process because I set it up to generate very little load on the filesystems it processes : there may be room to improve. Part of the OSD boot up process is also the handling of existing snapshots and journal replay. I've also had several btrfs based OSDs that took up to 20-30 minutes to start, especially after a crash. During journal replay the OSD daemon creates a number of new snapshot for its operations (newly created snap_XYZ directories that vanish after a short time). This snapshotting probably also adds overhead to the OSD startup time. I have disabled snapshots in my setup now, since the stock ubuntu trusty kernel had some stability problems with btrfs. I also had to establish cron jobs for rebalancing the btrfs partitions. It compacts the extents and may reduce the total amount of space taken. Unfortunately this procedure is not a default in most distribution (it definitely should be!). The problems associated with unbalanced extents should have been solved in kernel 3.18, but I didn't had the time to check it yet. As a side note: I had several OSD with dangling snapshots (more than the two usually handled by the OSD). They are probably due to crashed OSD daemons. You have to remove the manually, otherwise they start to consume disk space. Best regards, Burkhard ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Networking question
This page explains what happens quite well: http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#flapping-osds We recommend using both a public (front-end) network and a cluster (back-end) network so that you can better meet the capacity requirements of object replication. Another advantage is that you can run a cluster network such that it isn’t connected to the internet, thereby preventing some denial of service attacks. When OSDs peer and check heartbeats, they use the cluster (back-end) network when it’s available. See Monitor/OSD Interaction for details. However, if the cluster (back-end) network fails or develops significant latency while the public (front-end) network operates optimally, OSDs currently do not handle this situation well. What happens is that OSDs mark each other down on the monitor, while marking themselves up. We call this scenario ‘flapping`. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Alexandre DERUMIER Sent: 07 May 2015 12:43 To: MEGATEL / Rafał Gawron Cc: ceph-users Subject: Re: [ceph-users] Networking question Hi, If I have two networks (public and cluster network) and one link in public network is broken ( cluster network is fine) what I will see in my cluster ? See http://ceph.com/docs/master/rados/configuration/network-config-ref/ only osd between them use private network. so if public network not work (on osd), mon will not see osd, so osd will be out. Or how works ceph if link to cluster network was broken and public network is fine ? I'm not sure here, osd will be enable to replicate, so maybe it going out itself ? - Mail original - De: MEGATEL / Rafał Gawron rafal.gaw...@megatel.com.pl À: ceph-users ceph-users@lists.ceph.com Envoyé: Jeudi 7 Mai 2015 10:11:20 Objet: [ceph-users] Networking question Hi I have theoretical question about network in ceph. If I have two networks (public and cluster network) and one link in public network is broken ( cluster network is fine) what I will see in my cluster ? How work ceph in this situation ? Or how works ceph if link to cluster network was broken and public network is fine ? CEPH node will available ? --- rgawron ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Please visit our new website at www.pml.ac.uk and follow us on Twitter @PlymouthMarine Winner of the Environment Conservation category, the Charity Awards 2014. Plymouth Marine Laboratory (PML) is a company limited by guarantee registered in England Wales, company number 4178503. Registered Charity No. 1091222. Registered Office: Prospect Place, The Hoe, Plymouth PL1 3DH, UK. This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. You are reminded that e-mail communications are not secure and may contain viruses; PML accepts no liability for any loss or damage which may be caused by viruses. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD in ceph.conf
I have not used ceph-deploy, but it should use ceph-disk for the OSD preparation. Ceph-disk creates GPT partitions with specific partition UUIDS for data and journals. When udev or init starts the OSD, or mounts it to a temp location reads the whoami file and the journal, then remounts it in the correct location. There is no need for fstab entries or the like. This allows you to easily move OSD disks between servers (if you take the journals with it). It's magic! But I think I just gave away the secret. Robert LeBlanc Sent from a mobile device please excuse any typos. On May 7, 2015 5:16 AM, Georgios Dimitrakakis gior...@acmac.uoc.gr wrote: Indeed it is not necessary to have any OSD entries in the Ceph.conf file but what happens in the event of a disk failure resulting in changing the mount device? For what I can see is that OSDs are mounted from entries in /etc/mtab (I am on CentOS 6.6) like this: /dev/sdj1 /var/lib/ceph/osd/ceph-8 xfs rw,noatime,inode64 0 0 /dev/sdh1 /var/lib/ceph/osd/ceph-6 xfs rw,noatime,inode64 0 0 /dev/sdg1 /var/lib/ceph/osd/ceph-5 xfs rw,noatime,inode64 0 0 /dev/sde1 /var/lib/ceph/osd/ceph-3 xfs rw,noatime,inode64 0 0 /dev/sdi1 /var/lib/ceph/osd/ceph-7 xfs rw,noatime,inode64 0 0 /dev/sdf1 /var/lib/ceph/osd/ceph-4 xfs rw,noatime,inode64 0 0 /dev/sdd1 /var/lib/ceph/osd/ceph-2 xfs rw,noatime,inode64 0 0 /dev/sdk1 /var/lib/ceph/osd/ceph-9 xfs rw,noatime,inode64 0 0 /dev/sdb1 /var/lib/ceph/osd/ceph-0 xfs rw,noatime,inode64 0 0 /dev/sdc1 /var/lib/ceph/osd/ceph-1 xfs rw,noatime,inode64 0 0 So in the event of a disk failure (e.g. disk SDH fails) then in the order the next one will take its place meaning that SDI will be seen as SDH upon next reboot thus it will be mounted as CEPH-6 instead of CEPH-7 and so on...resulting in a problematic configuration (I guess that lots of data will be start moving around, PGs will be misplaced etc.) Correct me if I am wrong but the proper way to mount them would be by using the UUID of the partition. Is it OK if I change the entries in /etc/mtab using the UUID=xx instead of /dev/sdX1?? Does CEPH try to mount them using a different config file and perhaps exports the entries at boot in /etc/mtab (in the latter case no modification in /etc/mtab will be taken into account)?? I have deployed the Ceph cluster using only the ceph-deploy command. Is there a parameter that I 've missed that must be used during deployment in order to specify the mount points using the UUIDs instead of the device names? Regards, George On Wed, 6 May 2015 22:36:14 -0600, Robert LeBlanc wrote: We dont have OSD entries in our Ceph config. They are not needed if you dont have specific configs for different OSDs. Robert LeBlanc Sent from a mobile device please excuse any typos. On May 6, 2015 7:18 PM, Florent MONTHEL wrote: Hi teqm, Is it necessary to indicate in ceph.conf all OSD that we have in the cluster ? we have today reboot a cluster (5 nodes RHEL 6.5) and some OSD seem to have change ID so crush map not mapped with the reality Thanks FLORENT MONTHEL ___ ceph-users mailing list ceph-users@lists.ceph.com [1] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2] Links: -- [1] mailto:ceph-users@lists.ceph.com [2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3] mailto:florent.mont...@flox-arts.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD in ceph.conf
Indeed it is not necessary to have any OSD entries in the Ceph.conf file but what happens in the event of a disk failure resulting in changing the mount device? For what I can see is that OSDs are mounted from entries in /etc/mtab (I am on CentOS 6.6) like this: /dev/sdj1 /var/lib/ceph/osd/ceph-8 xfs rw,noatime,inode64 0 0 /dev/sdh1 /var/lib/ceph/osd/ceph-6 xfs rw,noatime,inode64 0 0 /dev/sdg1 /var/lib/ceph/osd/ceph-5 xfs rw,noatime,inode64 0 0 /dev/sde1 /var/lib/ceph/osd/ceph-3 xfs rw,noatime,inode64 0 0 /dev/sdi1 /var/lib/ceph/osd/ceph-7 xfs rw,noatime,inode64 0 0 /dev/sdf1 /var/lib/ceph/osd/ceph-4 xfs rw,noatime,inode64 0 0 /dev/sdd1 /var/lib/ceph/osd/ceph-2 xfs rw,noatime,inode64 0 0 /dev/sdk1 /var/lib/ceph/osd/ceph-9 xfs rw,noatime,inode64 0 0 /dev/sdb1 /var/lib/ceph/osd/ceph-0 xfs rw,noatime,inode64 0 0 /dev/sdc1 /var/lib/ceph/osd/ceph-1 xfs rw,noatime,inode64 0 0 So in the event of a disk failure (e.g. disk SDH fails) then in the order the next one will take its place meaning that SDI will be seen as SDH upon next reboot thus it will be mounted as CEPH-6 instead of CEPH-7 and so on...resulting in a problematic configuration (I guess that lots of data will be start moving around, PGs will be misplaced etc.) Correct me if I am wrong but the proper way to mount them would be by using the UUID of the partition. Is it OK if I change the entries in /etc/mtab using the UUID=xx instead of /dev/sdX1?? Does CEPH try to mount them using a different config file and perhaps exports the entries at boot in /etc/mtab (in the latter case no modification in /etc/mtab will be taken into account)?? I have deployed the Ceph cluster using only the ceph-deploy command. Is there a parameter that I 've missed that must be used during deployment in order to specify the mount points using the UUIDs instead of the device names? Regards, George On Wed, 6 May 2015 22:36:14 -0600, Robert LeBlanc wrote: We dont have OSD entries in our Ceph config. They are not needed if you dont have specific configs for different OSDs. Robert LeBlanc Sent from a mobile device please excuse any typos. On May 6, 2015 7:18 PM, Florent MONTHEL wrote: Hi teqm, Is it necessary to indicate in ceph.conf all OSD that we have in the cluster ? we have today reboot a cluster (5 nodes RHEL 6.5) and some OSD seem to have change ID so crush map not mapped with the reality Thanks FLORENT MONTHEL ___ ceph-users mailing list ceph-users@lists.ceph.com [1] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2] Links: -- [1] mailto:ceph-users@lists.ceph.com [2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3] mailto:florent.mont...@flox-arts.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS unexplained writes
Sam? This looks to be the HashIndex::SUBDIR_ATTR, but I don't know exactly what it's for nor why it would be getting constantly created and removed on a pure read workload... On Thu, May 7, 2015 at 2:55 PM, Erik Logtenberg e...@logtenberg.eu wrote: It does sound contradictory: why would read operations in cephfs result in writes to disk? But they do. I upgraded to Hammer last week and I am still seeing this. The setup is as follows: EC-pool on hdd's for data replicated pool on ssd's for data-cache replicated pool on ssd's for meta-data Now whenever I start doing heavy reads on cephfs, I see intense bursts of write operations on the hdd's. The reads I'm doing are things like reading a large file (streaming a video), or running a big rsync job with --dry-run (so it just checks meta-data). No clue why that would have any effect on the hdd's, but it does. Now, to further figure out what's going on, I tried using lsof, atop, iotop, but those tools don't provide the necessary information. In lsof I just see a whole bunch of files opened at any time, but it doesn't change much during these tests. In atop and iotop I can clearly see that the hdd's are doing a lot of writes when I'm reading in cephfs, but those tools can't tell me what those writes are. So I tried strace, which can trace file operations and attach to running processes. # strace -f -e trace=file -p 5076 This gave me an idea of what was going on. 5076 is the process id of the osd for one of the hdd's. I saw mostly stat's and open's, but those are all reads, not writes. Of course btrfs can cause writes when doing reads (atime), but I have the osd mounted with noatime. The only write operations that I saw a lot of are these: [pid 5350] getxattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3, user.cephos.phash.contents, \1Q\0\0\0\0\0\0\0\0\0\0\0\4\0\0, 1024) = 17 [pid 5350] setxattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3, user.cephos.phash.contents, \1R\0\0\0\0\0\0\0\0\0\0\0\4\0\0, 17, 0) = 0 [pid 5350] removexattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3, user.cephos.phash.contents@1) = -1 ENODATA (No data available) So it appears that the osd's aren't writing actual data to disk, but metadata in the form of xattr's. Can anyone explain what this setting and removing of xattr's could be for? Kind regards, Erik. On 03/16/2015 10:44 PM, Gregory Farnum wrote: The information you're giving sounds a little contradictory, but my guess is that you're seeing the impacts of object promotion and flushing. You can sample the operations the OSDs are doing at any given time by running ops_in_progress (or similar, I forget exact phrasing) command on the OSD admin socket. I'm not sure if rados df is going to report cache movement activity or not. That though would mostly be written to the SSDs, not the hard drives — although the hard drives could still get metadata updates written when objects are flushed. What data exactly are you seeing that's leading you to believe writes are happening against these drives? What is the exact CephFS and cache pool configuration? -Greg On Mon, Mar 16, 2015 at 2:36 PM, Erik Logtenberg e...@logtenberg.eu wrote: Hi, I forgot to mention: while I am seeing these writes in iotop and /proc/diskstats for the hdd's, I am -not- seeing any writes in rados df for the pool residing on these disks. There is only one pool active on the hdd's and according to rados df it is getting zero writes when I'm just reading big files from cephfs. So apparently the osd's are doing some non-trivial amount of writing on their own behalf. What could it be? Thanks, Erik. On 03/16/2015 10:26 PM, Erik Logtenberg wrote: Hi, I am getting relatively bad performance from cephfs. I use a replicated cache pool on ssd in front of an erasure coded pool on rotating media. When reading big files (streaming video), I see a lot of disk i/o, especially writes. I have no clue what could cause these writes. The writes are going to the hdd's and they stop when I stop reading. I mounted everything with noatime and nodiratime so it shouldn't be that. On a related note, the Cephfs metadata is stored on ssd too, so metadata-related changes shouldn't hit the hdd's anyway I think. Any thoughts? How can I get more information about what ceph is doing? Using iotop I only see that the osd processes are busy but it doesn't give many hints as to what they are doing. Thanks, Erik. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com
[ceph-users] osd does not start when object store is set to newstore
Hi, I built and installed ceph source from (wip-newstore) branch and could not start osd with newstore as osd objectstore. $ sudo /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f 2015-05-08 05:49:16.130073 7f286be01880 -1 unable to create object store $ ceph.config ( I have the following settings in ceph.conf) [global] osd objectstore = newstore newstore backend = rocksdb enable experimental unrecoverable data corrupting features = newstore The logs does not show much details. $ tail -f /var/log/ceph/ceph-osd.0.log 2015-05-08 00:01:54.331136 7fb00e07c880 0 ceph version (), process ceph-osd, pid 23514 2015-05-08 00:01:54.331202 7fb00e07c880 -1 unable to create object store Am I missing something? Regards Srikanth ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [cephfs][ceph-fuse] cache size or memory leak?
I tried echo 3 /proc/sys/vm/drop_caches and dentry_pinned_count dropped. Thanks for your help. On Thu, Apr 30, 2015 at 11:34 PM Yan, Zheng uker...@gmail.com wrote: On Thu, Apr 30, 2015 at 4:37 PM, Dexter Xiong dxtxi...@gmail.com wrote: Hi, I got these message when I remount: 2015-04-30 15:47:58.199837 7f9ad30a27c0 -1 asok(0x3c83480) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.asok': (17) File exists fuse: bad mount point `ceph-fuse': No such file or directory ceph-fuse[2576]: fuse failed to initialize 2015-04-30 15:47:58.199980 7f9ad30a27c0 -1 init, newargv = 0x3ca9b00 newargc=14 2015-04-30 15:47:58.200020 7f9ad30a27c0 -1 fuse_parse_cmdline failed. ceph-fuse[2574]: mount failed: (22) Invalid argument. It seems that FUSE doesn't support remount? This link is google result. please try echo 3 /proc/sys/vm/drop_caches. check if the pinned dentries count drops after executing the command. Regards Yan, Zheng I am using ceph-dokan too. And I got the similar memory problem. I don't know if it is the same problem. I switched to use kernel module and Samba to replace previous solution temporarily. I'm trying to read and track the ceph ceph-dokan source code to find more useful information. I don't know if my previous email arrived the list(Maybe the attachment is too large). Here is its content: I wrote a test case with Python: ''' import os for i in range(200): dir_name = '/srv/ceph_fs/test/d%s'%i os.mkdir(dir_name) for j in range(3): with open('%s/%s'%(dir_name, j), 'w') as f: f.write('0') ''' The output of status command after test on a fresh mount: { metadata: { ceph_sha1: e4bfad3a3c51054df7e537a724c8d0bf9be972ff, ceph_version: ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff), entity_id: admin, hostname: local-share-server, mount_point: \/srv\/ceph_fs }, dentry_count: 204, dentry_pinned_count: 201, inode_count: 802, mds_epoch: 25, osd_epoch: 177, osd_epoch_barrier: 176 } It seems that all pinned dentrys are directories from dump cache command output. Attachment is a package of debug log and dump cache content. On Thu, Apr 30, 2015 at 2:55 PM Yan, Zheng uker...@gmail.com wrote: On Wed, Apr 29, 2015 at 4:33 PM, Dexter Xiong dxtxi...@gmail.com wrote: The output of status command of fuse daemon: dentry_count: 128966, dentry_pinned_count: 128965, inode_count: 409696, I saw the pinned dentry is nearly the same as dentry. So I enabled debug log(debug client = 20/20) and read Client.cc source code in general. I found that an entry will not be trimed if it is pinned. But how can I unpin dentrys? Maybe these dentries are pinned by fuse kernel module (ceph-fuse does not try trimming kernel cache when its cache size client_cache_size). Could you please run mount -o remount mount point, then run the status command again. check if number of pinned dentries drops. Regards Yan, Zheng On Wed, Apr 29, 2015 at 12:19 PM Dexter Xiong dxtxi...@gmail.com wrote: I tried set client cache size = 100, but it doesn't solve the problem. I tested ceph-fuse with kernel version 3.13.0-24 3.13.0-49 and 3.16.0-34. On Tue, Apr 28, 2015 at 7:39 PM John Spray john.sp...@redhat.com wrote: On 28/04/2015 06:55, Dexter Xiong wrote: Hi, I've deployed a small hammer cluster 0.94.1. And I mount it via ceph-fuse on Ubuntu 14.04. After several hours I found that the ceph-fuse process crashed. The end is the crash log from /var/log/ceph/ceph-client.admin.log. The memory cost of ceph-fuse process was huge(more than 4GB) when it crashed. Then I did some test and found these actions will increase memory cost of ceph-fuse rapidly and the memory cost never seem to decrease: * rsync command to sync small files(rsync -a /mnt/some_small /srv/ceph) * chown command/ chmod command(chmod 775 /srv/ceph -R) But chown/chmod command on accessed files will not increase the memory cost. It seems that ceph-fuse caches the file nodes but never releases them. I don't know if there is an option to control the cache size. I set mds cache size = 2147483647 option to improve the performance of mds, and I tried to set mds cache size = 1000 at client side but it doesn't effect the result. The setting for client-side cache limit is client cache size, default is 16384 What kernel version are you using on the client? There have been some issues with cache trimming vs. fuse in recent kernels, but we thought we had workarounds in place...
Re: [ceph-users] Kicking 'Remapped' PGs
This is pretty weird to me. Normally those PGs should be reported as active, or stale, or something else in addition to remapped. Sam suggests that they're probably stuck activating for some reason (which is a state in new enough code, but not all versions), but I can't tell or imagine why from these settings. You might have hit a bug I'm not familiar with that will be jostled by just restarting the OSDs in question... :/ -Greg On Tue, May 5, 2015 at 7:46 AM, Paul Evans p...@daystrom.com wrote: Gregory Farnum g...@gregs42.com wrote: Oh. That's strange; they are all mapped to two OSDs but are placed on two different ones. I'm...not sure why that would happen. Are these PGs active? What's the full output of ceph -s? Those 4 PG’s went inactive at some point, and we had the luxury of time to understand how we arrived at this state before we truly have to fix it (but that time is soon). So...We kicked a couple of OSD’s out yesterday to let the cluster re-shuffle things (osd.19 and osd.34…both of which were non-primary copies of the ‘acting’ PG map) and now the cluster status is even more interesting, IMHO: ceph@nc48-n1:/ceph-deploy/nautilus$ ceph -s cluster 68bc69c1-1382-4c30-9bf8-480e32cc5b92 health HEALTH_WARN 2 pgs stuck inactive; 2 pgs stuck unclean; nodeep-scrub flag(s) set; crush map has legacy tunables monmap e1: 3 mons at {nc48-n1=10.253.50.211:6789/0,nc48-n2=10.253.50.212:6789/0,nc48-n3=10.253.50.213:6789/0}, election epoch 564, quorum 0,1,2 nc48-n1,nc48-n2,nc48-n3 osdmap e80862: 94 osds: 94 up, 92 in flags nodeep-scrub pgmap v1954234: 6144 pgs, 2 pools, 35251 GB data, 4419 kobjects 91727 GB used, 245 TB / 334 TB avail 6140 active+clean 2 remapped 2 active+clean+scrubbing ceph@nc48-n1:/ceph-deploy/nautilus$ ceph pg dump_stuck ok pg_statobjectsmipdegrunfbyteslogdisklogstate state_stampvreportedupup_primaryactingacting_primary last_scrubscrub_stamplast_deep_scrubdeep_scrub_stamp 11.e2f280000233984418130013001remapped 2015-04-23 13:18:59.29958968310'5108280862:121916[77,4]77 [77,34]7768310'510822015-04-23 11:40:11.5654870'0 2014-10-20 13:41:46.122624 11.323282000235718664730013001remapped 2015-04-23 13:18:58.97039670105'4896180862:126346[0,37]0 [0,19]070105'489612015-04-23 11:47:02.9801458145'44375 2015-03-30 16:09:36.975875 -- Paul ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] osd does not start when object store is set to newstore
I think you need to add the following.. enable experimental unrecoverable data corrupting features = newstore rocksdb Thanks Regards Somnath From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Srikanth Madugundi Sent: Thursday, May 07, 2015 10:56 PM To: ceph-us...@ceph.com Subject: [ceph-users] osd does not start when object store is set to newstore Hi, I built and installed ceph source from (wip-newstore) branch and could not start osd with newstore as osd objectstore. $ sudo /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph -f 2015-05-08 05:49:16.130073 7f286be01880 -1 unable to create object store $ ceph.config ( I have the following settings in ceph.conf) [global] osd objectstore = newstore newstore backend = rocksdb enable experimental unrecoverable data corrupting features = newstore The logs does not show much details. $ tail -f /var/log/ceph/ceph-osd.0.log 2015-05-08 00:01:54.331136 7fb00e07c880 0 ceph version (), process ceph-osd, pid 23514 2015-05-08 00:01:54.331202 7fb00e07c880 -1 unable to create object store Am I missing something? Regards Srikanth PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] About Ceph Cache Tier parameters
-Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of GODIN Vincent (SILCA) Sent: 07 May 2015 11:13 To: ceph-users@lists.ceph.com Subject: [ceph-users] About Ceph Cache Tier parameters Hi, In Cache Tier parameters, there is nothing to tell the cache to flush dirty objects on cold storage when the cache is under-utilized (as far as you 're under the cache_target_dirty_ratio, it's look like dirty objects can be keeped in the cache for years). Yes this is correct, I have played around with a cron job to flush the dirty blocks when I know the cluster will be idle, this improves write performance for the next bunch of bursty writes. I think the idea behind the current cache thinking is more geared to something like running VM's where typically the same hot blocks will be written to over and over again. My workload involves a significant number of blocks which are written once and then never again and so flushing the cache before each job run seems to improve performance. That is to say that the flush operations will always start during writes and when we have reached the cache_target_dirty_ratio value : this will slow down the current writes IO. Are some futur requests planned to improve this behavior ? Not that I'm currently aware of, but I did post here a couple of weeks ago suggesting that maybe having high and low watermarks for the cache flushing might improve performance. At the low watermark, cache would be flushed with a low/idle priority (much like scrub options) and at the high watermark the current flushing behaviour would start. I didn't get any response, so I think this idea may have hit a bit of a dead end. I did start having a looking through the Ceph source to see if it was something I could try doing myself, but I haven't found enough time to get my head round it. Thanks for your response Vince Ce message et toutes les pi?ces jointes (ci-apr?s le Message) sont strictement confidentiels et sont ?tablis ? l'attention exclusive de ses destinataires. Si vous recevez ce message par erreur, merci de le d?truire et d'en avertir imm?diatement l'exp?diteur par e-mail. Toute utilisation de ce message non conforme ? sa destination, toute modification, ?dition, ou diffusion totale ou partielle non autoris?e est interdite. SILCA d?cline toute responsabilit?; au titre de ce Message s'il a ?t? alt?r?, d?form?, falsifi? ou encore ?dit? ou diffus? sans autorisation. This mail message and attachments (the Message) are confidential and solely intended for the addressees. If you receive this message in error, please delete it and immediately notify the sender by e-mail. Any use other than its intended purpose, review, retransmission, dissemination, either whole or partial is prohibited except if formal approval is granted. SILCA shall not be liable for the Message if altered, changed, falsified , retransmeted or disseminated. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Change pool id
Hi Just wanted to mention this again, if it went unnoticed. Problem is that I need to get the same ID for a pool as it was before, or a way to tell ceph where to find the original image for the VM's. I have them available. T From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Tuomas Juntunen Sent: 5. toukokuuta 2015 16:24 To: ceph-users@lists.ceph.com Subject: [ceph-users] Change pool id Hi Previously I had to delete one pool because of a mishap I did. Now I need to create the pool again and give it the same id. How would one do that? I assume my root problem is that, since I had to delete the images pool, the base images vm's use are missing. I have the images available in images pool. Would changing the id of the pool fix this (images is now id 18, should be id 4)? Below is the result of 'rbd ls vms -l' which shows that the file obviously is missing. 2015-05-05 16:19:12.634163 7fe7a9b22840 -1 librbd: error looking up name for pool id 4: (2) No such file or directory 2015-05-05 16:19:12.634194 7fe7a9b22840 -1 librbd: error opening parent snapshot: (2) No such file or directory Br, T ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] unable to start monitor
Hi, I am setting up a local instance of ceph cluster with latest source from git hub. The build succeeded and installation was successful, But I could not start the monitor. The ceph start command returns immediately and does not output anything. $ sudo /etc/init.d/ceph start mon.monitor1 $ $ ls -l /var/lib/ceph/mon/ceph-monitor1/ total 8 -rw-r--r-- 1 root root0 May 7 20:27 done -rw-r--r-- 1 root root 77 May 7 19:12 keyring drwxr-xr-x 2 root root 4096 May 7 19:12 store.db -rw-r--r-- 1 root root0 May 7 20:26 sysvinit -rw-r--r-- 1 root root0 May 7 20:09 upstart The log filed does not seem to have any details either $ cat /var/log/ceph/ceph-mon.monitor1.log 2015-05-07 19:12:13.356389 7f3f06bdb880 -1 did not load config file, using default settings. $ cat /etc/ceph/ceph.conf [global] mon host = 15.43.33.21 fsid = 92f859df-8b27-466a-8d44-01af2b7ea7e6 mon initial members = monitor1 # Enable authentication auth cluster required = cephx auth service required = cephx auth client required = cephx # POOL / PG / CRUSH osd pool default size = 3 # Write an object 3 times osd pool default min size = 1 # Allow writing one copy in a degraded state # Ensure you have a realistic number of placement groups. We recommend # approximately 200 per OSD. E.g., total number of OSDs multiplied by 200 # divided by the number of replicas (i.e., osd pool default size). # !! BE CAREFULL !! # You properly should never rely on the default numbers when creating pool! osd pool default pg num = 32 osd pool default pgp num = 32 #log file = /home/y/logs/ceph/$cluster-$type.$id.log # Logging debug paxos = 0 debug throttle = 0 keyring = /etc/ceph/ceph.client.admin.keyring #run dir = /home/y/var/run/ceph [mon] debug mon = 10 debug ms = 1 # We found that when the disk usage reach to 94%, the disk could not be written # any file (no free space), so that we lower the full ratio and we should start # data migration before it becomes full mon osd full ratio = 0.9 #mon data = /home/y/var/lib/ceph/mon/$cluster-$id mon osd down out interval = 172800 # 2 * 24 * 60 * 60 seconds # Ceph monitors need to be told how many reporters must to be seen from different # OSDs before it can be marked offline, this should be greater than the number of # OSDs per OSD host mon osd min down reporters = 12 #keyring = /home/y/conf/ceph/ceph.mon.keyring ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RGW - Can't download complete object
I have another thread goign on about truncation of objects and I believe this is a separate but equally bad issue in civetweb/radosgw. My cluster is completely healthy I have one (possibly more) objects stored in ceph rados gateway that will return a different size every time I Try to download it:: http://pastebin.com/hK1iqXZH --- ceph -s http://pastebin.com/brmxQRu3 --- radosgw-admin object stat of the object http://pastebin.com/5TnvgMrX --- python download code The weird part is every time I download the file it is of a different size. I am grabbing the individual objects of the 14g file and will update this email once I have them all statted out. Currently I am getting, on average, 1.5G to 2Gb files when the total object should be 14G in size. lacadmin@kh10-9:~$ python corruptpull.py the download failed. The filesize = 2125988202. The actual size is 14577056082. Attempts = 1 the download failed. The filesize = 2071462250. The actual size is 14577056082. Attempts = 2 the download failed. The filesize = 2016936298. The actual size is 14577056082. Attempts = 3 the download failed. The filesize = 1643643242. The actual size is 14577056082. Attempts = 4 the download failed. The filesize = 1597505898. The actual size is 14577056082. Attempts = 5 the download failed. The filesize = 2075656554. The actual size is 14577056082. Attempts = 6 the download failed. The filesize = 650117482. The actual size is 14577056082. Attempts = 7 the download failed. The filesize = 1987576170. The actual size is 14577056082. Attempts = 8 the download failed. The filesize = 2109210986. The actual size is 14577056082. Attempts = 9 the download failed. The filesize = 2142765418. The actual size is 14577056082. Attempts = 10 the download failed. The filesize = 2134376810. The actual size is 14577056082. Attempts = 11 the download failed. The filesize = 2146959722. The actual size is 14577056082. Attempts = 12 the download failed. The filesize = 2142765418. The actual size is 14577056082. Attempts = 13 the download failed. The filesize = 1467482474. The actual size is 14577056082. Attempts = 14 the download failed. The filesize = 2046296426. The actual size is 14577056082. Attempts = 15 the download failed. The filesize = 2021130602. The actual size is 14577056082. Attempts = 16 the download failed. The filesize = 177366. The actual size is 14577056082. Attempts = 17 the download failed. The filesize = 2146959722. The actual size is 14577056082. Attempts = 18 the download failed. The filesize = 2016936298. The actual size is 14577056082. Attempts = 19 the download failed. The filesize = 1983381866. The actual size is 14577056082. Attempts = 20 the download failed. The filesize = 2134376810. The actual size is 14577056082. Attempts = 21 Notice it is always different. Once the rados -p .rgw.buckets ls | grep finishes I will return the listing of objects as well but this is quite odd and I think this is a separate issue. Has anyone seen this before? Why wouldn't radosgw return an error and why am I getting different file sizes? I would post the log from radosgw but I don't see any err|wrn|fatal mentions in the log and the client completes without issue every time. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS unexplained writes
It does sound contradictory: why would read operations in cephfs result in writes to disk? But they do. I upgraded to Hammer last week and I am still seeing this. The setup is as follows: EC-pool on hdd's for data replicated pool on ssd's for data-cache replicated pool on ssd's for meta-data Now whenever I start doing heavy reads on cephfs, I see intense bursts of write operations on the hdd's. The reads I'm doing are things like reading a large file (streaming a video), or running a big rsync job with --dry-run (so it just checks meta-data). No clue why that would have any effect on the hdd's, but it does. Now, to further figure out what's going on, I tried using lsof, atop, iotop, but those tools don't provide the necessary information. In lsof I just see a whole bunch of files opened at any time, but it doesn't change much during these tests. In atop and iotop I can clearly see that the hdd's are doing a lot of writes when I'm reading in cephfs, but those tools can't tell me what those writes are. So I tried strace, which can trace file operations and attach to running processes. # strace -f -e trace=file -p 5076 This gave me an idea of what was going on. 5076 is the process id of the osd for one of the hdd's. I saw mostly stat's and open's, but those are all reads, not writes. Of course btrfs can cause writes when doing reads (atime), but I have the osd mounted with noatime. The only write operations that I saw a lot of are these: [pid 5350] getxattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3, user.cephos.phash.contents, \1Q\0\0\0\0\0\0\0\0\0\0\0\4\0\0, 1024) = 17 [pid 5350] setxattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3, user.cephos.phash.contents, \1R\0\0\0\0\0\0\0\0\0\0\0\4\0\0, 17, 0) = 0 [pid 5350] removexattr(/var/lib/ceph/osd/ceph-10/current/4.1es1_head/DIR_E/DIR_1/DIR_D/DIR_3, user.cephos.phash.contents@1) = -1 ENODATA (No data available) So it appears that the osd's aren't writing actual data to disk, but metadata in the form of xattr's. Can anyone explain what this setting and removing of xattr's could be for? Kind regards, Erik. On 03/16/2015 10:44 PM, Gregory Farnum wrote: The information you're giving sounds a little contradictory, but my guess is that you're seeing the impacts of object promotion and flushing. You can sample the operations the OSDs are doing at any given time by running ops_in_progress (or similar, I forget exact phrasing) command on the OSD admin socket. I'm not sure if rados df is going to report cache movement activity or not. That though would mostly be written to the SSDs, not the hard drives — although the hard drives could still get metadata updates written when objects are flushed. What data exactly are you seeing that's leading you to believe writes are happening against these drives? What is the exact CephFS and cache pool configuration? -Greg On Mon, Mar 16, 2015 at 2:36 PM, Erik Logtenberg e...@logtenberg.eu wrote: Hi, I forgot to mention: while I am seeing these writes in iotop and /proc/diskstats for the hdd's, I am -not- seeing any writes in rados df for the pool residing on these disks. There is only one pool active on the hdd's and according to rados df it is getting zero writes when I'm just reading big files from cephfs. So apparently the osd's are doing some non-trivial amount of writing on their own behalf. What could it be? Thanks, Erik. On 03/16/2015 10:26 PM, Erik Logtenberg wrote: Hi, I am getting relatively bad performance from cephfs. I use a replicated cache pool on ssd in front of an erasure coded pool on rotating media. When reading big files (streaming video), I see a lot of disk i/o, especially writes. I have no clue what could cause these writes. The writes are going to the hdd's and they stop when I stop reading. I mounted everything with noatime and nodiratime so it shouldn't be that. On a related note, the Cephfs metadata is stored on ssd too, so metadata-related changes shouldn't hit the hdd's anyway I think. Any thoughts? How can I get more information about what ceph is doing? Using iotop I only see that the osd processes are busy but it doesn't give many hints as to what they are doing. Thanks, Erik. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] export-diff exported only 4kb instead of 200-600gb
Hi all, Something strange occurred. I have ceph 0.87 version and 2048gb image format 1. I decided to made incremental backups between clusters i've made initial copy, time bbcp -x 7M -P 3600 -w 32M -s 6 -Z 5030:5035 -N io rbd export-diff --cluster cluster1 --pool RBD-01 --image CEPH_006__01__NA__0003__ESX__ALL_EXT --snap move2db24-20150428 - 1.1.1.1:rbd import-diff - --cluster cluster2 --pool TST-INT-SD-RBD-1DC --image temp and decide to move incremental(it should be about 200-600gb of changes) time bbcp -c -x 7M -P 3600 -w 32M -s 6 -Z 5030:5035 -N io rbd --cluster cluster1 --pool RBD-01 --image CEPH_006__01__NA__0003__ESX__ALL_EXT --from-snap move2db24-20150428 --snap 2015-05-05 - 1.1.1.1:rbd import-diff - --cluster cluster2 --pool TST-INT-SD-RBD-1DC --image temp it itakes about 30min(it was too fast because i have limitation 7M bettween clusters) and i decided to check how much data was transfered time rbd export-diff --cluster cluster1 --pool RBD-01 --image CEPH_006__01__NA__0003__ESX__ALL_EXT --from-snap move2db24-20150428 --snap 2015-05-05 -|wc -c 4753 Exporting image: 100% complete...done. i've double checked it.. it was really 4753 bytes, i've decided to check export-diff file 000: 7262 6420 6469 2076 310a 6612 rbd diff v1.f... 010: 006d 6f76 6532 6462 3234 2d32 3031 3530 .move2db24-20150 020: 3432 3874 0a00 3230 3135 2d30 352d 428t2015-05- 030: 3035 7300 0200 0077 0080 5501 05sw..U. 040: 0002 02ef cdab 050: 0080 3500 0d00 ..5. 060: 2d58 3aff 5002 bc56 2255 08fc 14a9 -X:.PVU 070: e6c0 e839 351a 942c 01de 4603 0e00 ...95..,..F. 080: 3a00 :... 090: 0a0: .. 0001270: 0001280: 0001290: 65 e it look like correct format.. i've made clone(like flex clone) from snapshot 2015-05-05 found that it doesn't have changes from snap move2db24-20150428 Do you have any ideas, what should i check? why is it hapend? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] wrong diff-export format description
You are correct -- it is little endian like the other values. I'll open a ticket to correct the document. -- Jason Dillaman Red Hat dilla...@redhat.com http://www.redhat.com - Original Message - From: Ultral ultral...@gmail.com To: ceph-us...@ceph.com Sent: Thursday, May 7, 2015 5:23:12 AM Subject: [ceph-users] wrong diff-export format description Hi all, It looks like a bit wrong description http://ceph.com/docs/master/dev/rbd-diff/ * u8: ‘s’ * u64: (ending) image size I suppose that instead of u64 should be used something like le64, isn't it? Because of from this description is not clear which bytes order should I use.. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to backup hundreds or thousands of TB
On 05/07/2015 12:10 PM, John Spray wrote: On 06/05/2015 16:58, Scottix wrote: As a point to * someone accidentally removed a thing, and now they need a thing back I thought MooseFS has an interesting feature that I thought would be good for CephFS and maybe others. Basically a timed Trashbin Deleted files are retained for a configurable period of time (a file system level trash bin) It's an idea to cover this use case. Until recently we had a bug where deleted files weren't purged until the next MDS restart, so maybe we should just back out the fix for that :-D Seriously though, I didn't know about that MooseFS feature, it's interesting that they decided to implement that. It would be fairly straightforward to do that in CephFS (we already put deleted files into a 'stray' directory before purging them asynchronously), but I think there might be some debate about whether it's really the role of the underlying filesystem to do that kind of thing. Aren't snapshots something that should protect you against removal? IF snapshots work properly in CephFS you could create a snapshot every hour. With the recursive statistics [0] of CephFS you could easily backup all your data to a different Ceph system or anything not Ceph. I've done this with a ~700TB CephFS cluster and that is still working properly. Wido [0]: http://blog.widodh.nl/2015/04/playing-with-cephfs-recursive-statistics/ John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] About Ceph Cache Tier parameters
Hi, In Cache Tier parameters, there is nothing to tell the cache to flush dirty objects on cold storage when the cache is under-utilized (as far as you 're under the cache_target_dirty_ratio, it's look like dirty objects can be keeped in the cache for years). That is to say that the flush operations will always start during writes and when we have reached the cache_target_dirty_ratio value : this will slow down the current writes IO. Are some futur requests planned to improve this behavior ? Thanks for your response Vince ___ Ce message et toutes les pièces jointes (ci-après le Message) sont strictement confidentiels et sont établis à l'attention exclusive de ses destinataires. Si vous recevez ce message par erreur, merci de le détruire et d'en avertir immédiatement l'expéditeur par e-mail. Toute utilisation de ce message non conforme à sa destination, toute modification, édition, ou diffusion totale ou partielle non autorisée est interdite. SILCA décline toute responsabilité au titre de ce Message s'il a été altéré, déformé, falsifié ou encore édité ou diffusé sans autorisation. This mail message and attachments (the Message) are confidential and solely intended for the addressees. If you receive this message in error, please delete it and immediately notify the sender by e-mail. Any use other than its intended purpose, review, retransmission, dissemination, either whole or partial is prohibited except if formal approval is granted. SILCA shall not be liable for the Message if altered, changed, falsified , retransmeted or disseminated. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Btrfs defragmentation
Hi, On 05/07/15 12:30, Burkhard Linke wrote: [...] Part of the OSD boot up process is also the handling of existing snapshots and journal replay. I've also had several btrfs based OSDs that took up to 20-30 minutes to start, especially after a crash. During journal replay the OSD daemon creates a number of new snapshot for its operations (newly created snap_XYZ directories that vanish after a short time). This snapshotting probably also adds overhead to the OSD startup time. I have disabled snapshots in my setup now, since the stock ubuntu trusty kernel had some stability problems with btrfs. I also had to establish cron jobs for rebalancing the btrfs partitions. It compacts the extents and may reduce the total amount of space taken. I'm not sure what you mean by compacting extents. I'm sure balance doesn't defragment or compress files. It moves extents and before 3.14 according to the Btrfs wiki it was used to reclaim allocated but unused space. This shouldn't affect performance and with modern kernels may not be needed to reclaim unused space anymore. Unfortunately this procedure is not a default in most distribution (it definitely should be!). The problems associated with unbalanced extents should have been solved in kernel 3.18, but I didn't had the time to check it yet. I don't have any btrfs filesystem running on 3.17 or earlier version anymore (with a notable exception, see below) so I can't comment. I have old btrfs filesystems that were created on 3.14 and are now on 3.18.x or 3.19.x (by the way avoid 3.18.9 to 3.19.4 if you can have any sort of power failure, there's a possibility of a mount deadlock which requires btrfs-zero-log to solve...). btrfs fi usage doesn't show anything suspicious on these old fs. I have a Jolla Phone which comes with a btrfs filesystem and uses an old heavily patched 3.4 kernel. It didn't have any problem yet but I don't stuff it with data (I've seen discussions about triggering a balance before a SailfishOS upgrade). I assume that you shouldn't have any problem with filesystems that aren't heavily used which should be the case with Ceph OSD (for example our current alert level is at 75% space usage). As a side note: I had several OSD with dangling snapshots (more than the two usually handled by the OSD). They are probably due to crashed OSD daemons. You have to remove the manually, otherwise they start to consume disk space. Thanks a lot, I didn't think it could happen. I'll configure an alert for this case. Best regards, Lionel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com