Call for jenkins slaves to improve multi operating system support
Hi Ceph, When a contribution is proposed to Ceph [1], a bot compiles and run tests with it to provide feedback to the developer [2]. When something goes wrong the failure can be repeated on the developer machine [3] for debug. This also helps the reviewer who knows the code compiles and does not break anything that would be detected by make check. The bot runs on CentOS 7 and Ubuntu 14.04 only, and problems related to older operating systems (headers, compiler version, etc.) may be detected later, when building packages [4] and after the pull request has been merged in master. This is rare but requires extra attention from the reviewer and needs to be dealt with urgently when it happens. If you can spare a machine to help expand the operating systems on which tests can run, it would be a great help. The minimum hardware configuration to run a slave is: * x86_64 architecture for CentOS 6, Fedora 21, OpenSUSE 13.2, Debian GNU/Linux Jessie, Ubuntu 14.02 32 GB RAM 200 GB SSD 8 core 2.5Ghz * i386 architecture for CentOS 7, CentOS 6, Fedora 21, Debian GNU/Linux Jessie, Ubuntu 14.04, Ubuntu 14.02 4 GB RAM 200 GB disk 2 core * armv7, armv8 architecture for Ubuntu 14.04 4 GB RAM 200 GB disk 2 core Note that since the make check bot can run in a docker container, x86_64 machines can be used to run any of the operating systems for which a docker file has been prepared [5]. Cheers [1] pull requests https://github.com/ceph/ceph/pulls [2] make check bot feedback https://github.com/ceph/ceph/pull/4296#issuecomment-90812064 [3] run-make-check.sh https://github.com/ceph/ceph/blob/master/run-make-check.sh#L44 [4] gitbuilder http://ceph.com/gitbuilder.cgi [5] https://ceph.com/git/?p=ceph.git;a=blob;f=src/test/Makefile.am;hb=hammer#l91 -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature
Re: Initial newstore vs filestore results
On Wed, Apr 8, 2015 at 10:58 AM, Sage Weil s...@newdream.net wrote: On Tue, 7 Apr 2015, Mark Nelson wrote: On 04/07/2015 02:16 PM, Mark Nelson wrote: On 04/07/2015 09:57 AM, Mark Nelson wrote: Hi Guys, I ran some quick tests on Sage's newstore branch. So far given that this is a prototype, things are looking pretty good imho. The 4MB object rados bench read/write and small read performance looks especially good. Keep in mind that this is not using the SSD journals in any way, so 640MB/s sequential writes is actually really good compared to filestore without SSD journals. small write performance appears to be fairly bad, especially in the RBD case where it's small writes to larger objects. I'm going to sit down and see if I can figure out what's going on. It's bad enough that I suspect there's just something odd going on. Mark Seekwatcher/blktrace graphs of a 4 OSD cluster using newstore for those interested: http://nhm.ceph.com/newstore/ Interestingly small object write/read performance with 4 OSDs was about 1/3-1/4 the speed of the same cluster with 36 OSDs. Note: Thanks Dan for fixing the directory column width! Mark New fio/librbd results using Sage's latest code that attempts to keep small overwrite extents in the db. This is 4 OSD so not directly comparable to the 36 OSD tests above, but does include seekwatcher graphs. Results in MB/s: write readrandw randr 4MB 57.9319.6 55.2285.9 128KB 2.5 230.6 2.4 125.4 4KB 0.4655.65 1.113.56 What would be very interesting would be to see the 4KB performance with the defaults (newstore overlay max = 32) vs overlays disabled (newstore overlay max = 0) and see if/how much it is helping. The latest branch also has open-by-handle. It's on by default (newstore open by handle = true). I think for most workloads it won't be very noticeable... I think there are two questions we need to answer though: 1) Does it have any impact on a creation workload (say, 4kb objects). It shouldn't, but we should confirm. 2) Does it impact small object random reads with a cold cache. I think to see the effect we'll probably need to pile a ton of objects into the store, drop caches, and then do random reads. In the best case the effect will be small, but hopefully noticeable: we should go from a directory lookup (1+ seeks) + inode lookup (1+ seek) + data read, to inode lookup (1+ seek) + data read. So, 3 - 2 seeks best case? I'm not really sure what XFS is doing under the covers here.. WOW, it's really a cool implementation beyond my original mind according to blueprint. Handler, overlay_map and data_map looks so flexible and make small io cheaper in theory. Now we only have 1 element in data_map and I'm not sure your goal about the future's usage. Although I have a unclearly idea that it could enhance the role of NewStore and make local filesystem just as a block space allocator. Let NewStore own a variable of FTL(File Translation Layer), so many cool features could be added. What's your idea about data_map? My concern currently still is WAL after fsync and kv commiting, maybe fsync process is just fine because mostly we won't meet this case in rbd. But submit sync kv transaction isn't a low latency job I think, maybe we could let WAL parallel with kv commiting?(yes, I really concern the latency of one op :-) ) Then from the actual rados write op, it will add setattr and omap_setkeys ops. Current NewStore looks plays badly for setattr. It always encode all xattrs(and other not so tiny fields) and write again (Is this true?) though it could batch multi transaction's onode write in short time. NewStore also employ much more workload to KeyValueDB compared to FileStore, so maybe we need to consider the rich workload again compared before. FileStore only use leveldb just for write workload mainly so leveldb could fit into greatly, but currently overlay keys(read) and onode(read) will occur a main latency source in normal IO I think. The default kvdb like leveldb and rocksdb both plays not well for random read workload, it maybe will be problem. Looking for another kv db maybe a choice. And it still doesn't add journal codes for wal? Anyway, NewStore should cover more workloads compared to FileStore. Good job! sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Best Regards, Wheat -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ceph-deploy with monitor port number?
Hello all, AFAICS, ceph-deploy does not yet support specifying the monitor port number for the new command. It would be great if new could take another parameter like ceph-deploy --cluster mycluster --monport 6790 node1 node2 node3 ... Our current workaround is to add the port number in the .conf file mon_host line by hand after new and before mon create-initial. With that change, all commands work as expected. Thanks, Amon Ott -- Dr. Amon Ott m-privacy GmbH Tel: +49 30 24342334 Werner-Voß-Damm 62 Fax: +49 30 99296856 12101 Berlin http://www.m-privacy.de Amtsgericht Charlottenburg, HRB 84946 Geschäftsführer: Dipl.-Kfm. Holger Maczkowsky, Roman Maczkowsky GnuPG-Key-ID: 0x2DD3A649 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
init script bug with multiple clusters
Hello Ceph! The Ceph init script (src/init-ceph.in) creates pid files without cluster names. This means that only one cluster can run at a time. The solution is simple and works fine here, patch against 0.94 is attached. Amon Ott -- Dr. Amon Ott m-privacy GmbH Tel: +49 30 24342334 Werner-Voß-Damm 62 Fax: +49 30 99296856 12101 Berlin http://www.m-privacy.de Amtsgericht Charlottenburg, HRB 84946 Geschäftsführer: Dipl.-Kfm. Holger Maczkowsky, Roman Maczkowsky GnuPG-Key-ID: 0x2DD3A649 --- ceph-0.93/src/init-ceph.in 2015-02-27 19:47:15.0 +0100 +++ ceph-0.93/src/init-ceph.in.mp 2015-04-07 13:29:47.127067864 +0200 @@ -227,7 +237,7 @@ get_conf run_dir /var/run/ceph run dir -get_conf pid_file $run_dir/$type.$id.pid pid file +get_conf pid_file $run_dir/$cluster-$type.$id.pid pid file if [ $command = start ]; then if [ -n $pid_file ]; then
Re: [ceph-users] OSD auto-mount after server reboot
On 08/04/2015 02:16, shiva rkreddy wrote: We didn't do a upgrade from 0.80.7 to 0.80.9. It was a fresh install with ceph 0.80.9 with ceph-deploy version 1.15.12. May be should have used ceph-deploy version 1.15.16 https://github.com/ceph/ceph-deploy/pull/240/files Ok, mystery solved then, right ? On Tue, Apr 7, 2015 at 4:31 PM, Loic Dachary l...@dachary.org mailto:l...@dachary.org wrote: On 07/04/2015 23:03, shiva rkreddy wrote: It turns out that ceph service is set ceph to off. Following defect talks about it..Once its set to on, everything worked fine after host reboot. http://tracker.ceph.com/issues/9090 Interesting :-) Does that mean that the ceph service was somehow turned off when upgrading from v0.80.7 to v0.80.9 ? On Tue, Apr 7, 2015 at 2:20 PM, Loic Dachary l...@dachary.org mailto:l...@dachary.org mailto:l...@dachary.org mailto:l...@dachary.org wrote: [cc'ing ceph-devel for archival purposes] Hi, On 07/04/2015 19:55, shiva rkreddy wrote: Hi Loic, I've looked at the way our cluster provisioning partition and label the journal device. Its done using /*parted*/ command with gpt label. Following is the parted output for /dev/sdr that we were looking yesterday. # parted /dev/sdr GNU Parted 2.1 Using /dev/sdr Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) p Model: ASR7160 JBOD-R (scsi) Disk /dev/sdr: 960GB Sector size (logical/physical): 512B/512B Partition Table: *gpt* Number Start EndSize File system Name Flags 1 9599MB 240GB 230GB primary 2 250GB 480GB 230GB primary 3 490GB 720GB 230GB primary 4 730GB 960GB 230GB primary (parted) *# sgdisk --info 1 /dev/sdr* Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Microsoft basic data) Partition unique GUID: 0A45987B-DC77-4816-B506-DEE0DC60AE9E First sector: 18747392 (at 8.9 GiB) Last sector: 468707327 (at 223.5 GiB) Partition size: 449959936 sectors (214.6 GiB) Attribute flags: Partition name: 'primary' We stared to put label on the disk based on a blog at http://blog.zhaw.ch/icclab/deploy-ceph-and-start-using-it-end-to-end-tutorial-installation-part-13/ I'm not sure why sgdisk is getting Microsoft basics data. It's not a big deal as long as it's what ceph-disk expects: https://github.com/ceph/ceph/blob/firefly/src/ceph-disk#L76 The more interesting question would be to figure out if ceph-disk-udev is called from https://github.com/ceph/ceph/blob/firefly/udev/95-ceph-osd-alt.rules We have tried udevadm trigger --sysname-match=sd* yesterday and verified it is called when a udev event is sent by the kernel. And you have added that to /etc/rc.local to make sure it is called at least once at boot time. Would you have time to experiment as described at https://wiki.ubuntu.com/DebuggingUdev to get more information about what happens during the boot phase of your machine ? Cheers Thanks, Shiva On Mon, Apr 6, 2015 at 2:29 PM, shiva rkreddy shiva.rkre...@gmail.com mailto:shiva.rkre...@gmail.com mailto:shiva.rkre...@gmail.com mailto:shiva.rkre...@gmail.com mailto:shiva.rkre...@gmail.com mailto:shiva.rkre...@gmail.com mailto:shiva.rkre...@gmail.com mailto:shiva.rkre...@gmail.com wrote: Hi, I'm on IRC as *shivark* # sgdisk --info 1 /dev/sdb Partition GUID code: 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D (Unknown) Partition unique GUID: 1618223E-B8C9-4C4A-B5D2-EBFF6D64CB12 First sector: 2048 (at 1024.0 KiB) Last sector: 7811870686 (at 3.6 TiB) Partition size: 7811868639 sectors (3.6 TiB) Attribute flags: Partition name: 'ceph data' # On Mon, Apr 6, 2015 at 2:09 PM, Loic Dachary l...@dachary.org mailto:l...@dachary.org mailto:l...@dachary.org mailto:l...@dachary.org mailto:l...@dachary.org mailto:l...@dachary.org mailto:l...@dachary.org mailto:l...@dachary.org wrote: Hi, lrwxrwxrwx 1 root root 10 Apr 4 16:27 1618223e-b8c9-4c4a-b5d2-ebff6d64cb12 - ../../sdb1 looks encouraging. Could you
Re: Preliminary RDMA vs TCP numbers
On Wed, Apr 8, 2015 at 11:17 AM, Somnath Roy somnath@sandisk.com wrote: Hi, Please find the preliminary performance numbers of TCP Vs RDMA (XIO) implementation (on top of SSDs) in the following link. http://www.slideshare.net/somnathroy7568/ceph-on-rdma The attachment didn't go through it seems, so, I had to use slideshare. Mark, If we have time, I can present it in tomorrow's performance meeting. Thanks Regards Somnath Those numbers are really impressive (for small numbers at least)! What are TCP settings you using?For example, difference can be lowered on scale due to less intensive per-connection acceleration on CUBIC on a larger number of nodes, though I do not believe that it was a main reason for an observed TCP catchup on a relatively flat workload such as fio generates. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Preliminary RDMA vs TCP numbers
Hi, Please find the preliminary performance numbers of TCP Vs RDMA (XIO) implementation (on top of SSDs) in the following link. http://www.slideshare.net/somnathroy7568/ceph-on-rdma The attachment didn't go through it seems, so, I had to use slideshare. Mark, If we have time, I can present it in tomorrow's performance meeting. Thanks Regards Somnath PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Backporting to Firefly
Hi, I see you have been busy backporting issues to Firefly today, this is great :-) https://github.com/ceph/ceph/pulls/xinxinsh It would be helpful if you could update the pull requests (and the corresponding issues) as explained at http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_backport_commits. Once it's done I propose we move to the next step, as explained at http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO: merging your pull requests in the integration branch ( step 5 http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_populate_the_integration_branch ) and running tests on them ( step 6 http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_run_integration_and_upgrade_tests). Cheers -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature
RE: Preliminary RDMA vs TCP numbers
I used the default TCP setting in Ubuntu 14.04. -Original Message- From: Andrey Korolyov [mailto:and...@xdel.ru] Sent: Wednesday, April 08, 2015 1:28 AM To: Somnath Roy Cc: ceph-us...@lists.ceph.com; ceph-devel Subject: Re: Preliminary RDMA vs TCP numbers On Wed, Apr 8, 2015 at 11:17 AM, Somnath Roy somnath@sandisk.com wrote: Hi, Please find the preliminary performance numbers of TCP Vs RDMA (XIO) implementation (on top of SSDs) in the following link. http://www.slideshare.net/somnathroy7568/ceph-on-rdma The attachment didn't go through it seems, so, I had to use slideshare. Mark, If we have time, I can present it in tomorrow's performance meeting. Thanks Regards Somnath Those numbers are really impressive (for small numbers at least)! What are TCP settings you using?For example, difference can be lowered on scale due to less intensive per-connection acceleration on CUBIC on a larger number of nodes, though I do not believe that it was a main reason for an observed TCP catchup on a relatively flat workload such as fio generates. PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). N�r��yb�X��ǧv�^�){.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
[ANN] ceph-deploy 1.5.23 released
Hi All, This is a new release of ceph-deploy that includes a new feature for Hammer and bugfixes. ceph-deploy can be installed from the ceph.com hosted repos for Firefly, Giant, Hammer, or testing, and is also available on PyPI. ceph-deploy now defaults to installing the Hammer release. If you need to install a different release, use the --release flag. To go along with the Hammer release, ceph-deploy now includes support for a drastically simplified deployment for RGW. See further details at [1] and [2]. This release also fixes an issue where keyrings pushed to remote nodes ended up with world-readable permissions. The full changelog can be seen at [3]. Please update! Cheers, - Travis [1] http://ceph.com/docs/master/start/quick-ceph-deploy/#add-an-rgw-instance [2] http://ceph.com/ceph-deploy/docs/rgw.html [3] http://ceph.com/ceph-deploy/docs/changelog.html#id2 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Initial newstore vs filestore results
On Wed, Apr 8, 2015 at 9:49 AM, Sage Weil s...@newdream.net wrote: On Wed, 8 Apr 2015, Haomai Wang wrote: On Wed, Apr 8, 2015 at 10:58 AM, Sage Weil s...@newdream.net wrote: On Tue, 7 Apr 2015, Mark Nelson wrote: On 04/07/2015 02:16 PM, Mark Nelson wrote: On 04/07/2015 09:57 AM, Mark Nelson wrote: Hi Guys, I ran some quick tests on Sage's newstore branch. So far given that this is a prototype, things are looking pretty good imho. The 4MB object rados bench read/write and small read performance looks especially good. Keep in mind that this is not using the SSD journals in any way, so 640MB/s sequential writes is actually really good compared to filestore without SSD journals. small write performance appears to be fairly bad, especially in the RBD case where it's small writes to larger objects. I'm going to sit down and see if I can figure out what's going on. It's bad enough that I suspect there's just something odd going on. Mark Seekwatcher/blktrace graphs of a 4 OSD cluster using newstore for those interested: http://nhm.ceph.com/newstore/ Interestingly small object write/read performance with 4 OSDs was about 1/3-1/4 the speed of the same cluster with 36 OSDs. Note: Thanks Dan for fixing the directory column width! Mark New fio/librbd results using Sage's latest code that attempts to keep small overwrite extents in the db. This is 4 OSD so not directly comparable to the 36 OSD tests above, but does include seekwatcher graphs. Results in MB/s: write readrandw randr 4MB 57.9319.6 55.2285.9 128KB 2.5 230.6 2.4 125.4 4KB 0.4655.65 1.113.56 What would be very interesting would be to see the 4KB performance with the defaults (newstore overlay max = 32) vs overlays disabled (newstore overlay max = 0) and see if/how much it is helping. The latest branch also has open-by-handle. It's on by default (newstore open by handle = true). I think for most workloads it won't be very noticeable... I think there are two questions we need to answer though: 1) Does it have any impact on a creation workload (say, 4kb objects). It shouldn't, but we should confirm. 2) Does it impact small object random reads with a cold cache. I think to see the effect we'll probably need to pile a ton of objects into the store, drop caches, and then do random reads. In the best case the effect will be small, but hopefully noticeable: we should go from a directory lookup (1+ seeks) + inode lookup (1+ seek) + data read, to inode lookup (1+ seek) + data read. So, 3 - 2 seeks best case? I'm not really sure what XFS is doing under the covers here.. WOW, it's really a cool implementation beyond my original mind according to blueprint. Handler, overlay_map and data_map looks so flexible and make small io cheaper in theory. Now we only have 1 element in data_map and I'm not sure your goal about the future's usage. Although I have a unclearly idea that it could enhance the role of NewStore and make local filesystem just as a block space allocator. Let NewStore own a variable of FTL(File Translation Layer), so many cool features could be added. What's your idea about data_map? Exactly, that is one option. The other is that we'd treat the data_map similar to overlay_map with a fixed or max extent size so that a large partial overwrite will mostly go to a new file instead of doing the slow WAL. My concern currently still is WAL after fsync and kv commiting, maybe fsync process is just fine because mostly we won't meet this case in rbd. But submit sync kv transaction isn't a low latency job I think, maybe we could let WAL parallel with kv commiting?(yes, I really concern the latency of one op :-) ) The WAL has to come after kv commit. But the fsync after the wal completion sucks, especially since we are always dispatching a single fsync at a time so it's kind of worst-case seek behavior. We could throw these into another parallel fsync queue so that the fs can batch them up, but I'm not sure we will enough parallelism. What would really be nice is a batch fsync syscall, but in leiu of that maybe we wait until we have a bunch of fsyncs pending and then throw them at the kernel together in a bunch of threads? Not sure. These aren't normally time sensitive unless a read comes along (which is pretty rare), but they have to be done for correctness. Couldn't we write both the log entry and the data in parallel and only acknowledge to the client once both commit? If we replay the log without the data we'll know it didn't get committed, and we can collect the data after replay if it's not referenced by the log (I'm speculating, as I haven't looked at the code or how it's actually choosing names). -Greg Then from the actual rados write op, it will add setattr and omap_setkeys ops. Current
Re: Call for jenkins slaves to improve multi operating system support
Hi Sage, On 08/04/2015 18:59, Sage Weil wrote: On Wed, 8 Apr 2015, Loic Dachary wrote: Hi Ceph, When a contribution is proposed to Ceph [1], a bot compiles and run tests with it to provide feedback to the developer [2]. When something goes wrong the failure can be repeated on the developer machine [3] for debug. This also helps the reviewer who knows the code compiles and does not break anything that would be detected by make check. The bot runs on CentOS 7 and Ubuntu 14.04 only, and problems related to older operating systems (headers, compiler version, etc.) may be detected later, when building packages [4] and after the pull request has been merged in master. This is rare but requires extra attention from the reviewer and needs to be dealt with urgently when it happens. Do additional slaves block the message from appearing on the pull request? I.e., what happens if a slave is very slow (e.g., armv7) or broken (network issue)? I will make it so a comment is posted as soon as the first slave succeeds / fails (it currently waits for all to finish which is inconvenient). The first slave will always be a CentOS 7 running on a fast machine so that the worst that can happen is that it's the only one to run. If slow slaves lag behind too much it would be nice to have a jenkins plugin that discards jobs randomly to prevent the queue from growing out of proportion on that specific slave. What are the connectivity requirements? Nothing more than the ability to git pull from a ceph repository. Can slaves exist on other (private) networks? Yes. In that case a ssh -f -n -L tunnel to the jenkins master will be established to allow it to probe the slave when necessary. Thanks! sage Cheers -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature
Re: Initial newstore vs filestore results
On Wed, 8 Apr 2015, Gregory Farnum wrote: On Wed, Apr 8, 2015 at 9:49 AM, Sage Weil s...@newdream.net wrote: On Wed, 8 Apr 2015, Haomai Wang wrote: On Wed, Apr 8, 2015 at 10:58 AM, Sage Weil s...@newdream.net wrote: On Tue, 7 Apr 2015, Mark Nelson wrote: On 04/07/2015 02:16 PM, Mark Nelson wrote: On 04/07/2015 09:57 AM, Mark Nelson wrote: Hi Guys, I ran some quick tests on Sage's newstore branch. So far given that this is a prototype, things are looking pretty good imho. The 4MB object rados bench read/write and small read performance looks especially good. Keep in mind that this is not using the SSD journals in any way, so 640MB/s sequential writes is actually really good compared to filestore without SSD journals. small write performance appears to be fairly bad, especially in the RBD case where it's small writes to larger objects. I'm going to sit down and see if I can figure out what's going on. It's bad enough that I suspect there's just something odd going on. Mark Seekwatcher/blktrace graphs of a 4 OSD cluster using newstore for those interested: http://nhm.ceph.com/newstore/ Interestingly small object write/read performance with 4 OSDs was about 1/3-1/4 the speed of the same cluster with 36 OSDs. Note: Thanks Dan for fixing the directory column width! Mark New fio/librbd results using Sage's latest code that attempts to keep small overwrite extents in the db. This is 4 OSD so not directly comparable to the 36 OSD tests above, but does include seekwatcher graphs. Results in MB/s: write readrandw randr 4MB 57.9319.6 55.2285.9 128KB 2.5 230.6 2.4 125.4 4KB 0.4655.65 1.113.56 What would be very interesting would be to see the 4KB performance with the defaults (newstore overlay max = 32) vs overlays disabled (newstore overlay max = 0) and see if/how much it is helping. The latest branch also has open-by-handle. It's on by default (newstore open by handle = true). I think for most workloads it won't be very noticeable... I think there are two questions we need to answer though: 1) Does it have any impact on a creation workload (say, 4kb objects). It shouldn't, but we should confirm. 2) Does it impact small object random reads with a cold cache. I think to see the effect we'll probably need to pile a ton of objects into the store, drop caches, and then do random reads. In the best case the effect will be small, but hopefully noticeable: we should go from a directory lookup (1+ seeks) + inode lookup (1+ seek) + data read, to inode lookup (1+ seek) + data read. So, 3 - 2 seeks best case? I'm not really sure what XFS is doing under the covers here.. WOW, it's really a cool implementation beyond my original mind according to blueprint. Handler, overlay_map and data_map looks so flexible and make small io cheaper in theory. Now we only have 1 element in data_map and I'm not sure your goal about the future's usage. Although I have a unclearly idea that it could enhance the role of NewStore and make local filesystem just as a block space allocator. Let NewStore own a variable of FTL(File Translation Layer), so many cool features could be added. What's your idea about data_map? Exactly, that is one option. The other is that we'd treat the data_map similar to overlay_map with a fixed or max extent size so that a large partial overwrite will mostly go to a new file instead of doing the slow WAL. My concern currently still is WAL after fsync and kv commiting, maybe fsync process is just fine because mostly we won't meet this case in rbd. But submit sync kv transaction isn't a low latency job I think, maybe we could let WAL parallel with kv commiting?(yes, I really concern the latency of one op :-) ) The WAL has to come after kv commit. But the fsync after the wal completion sucks, especially since we are always dispatching a single fsync at a time so it's kind of worst-case seek behavior. We could throw these into another parallel fsync queue so that the fs can batch them up, but I'm not sure we will enough parallelism. What would really be nice is a batch fsync syscall, but in leiu of that maybe we wait until we have a bunch of fsyncs pending and then throw them at the kernel together in a bunch of threads? Not sure. These aren't normally time sensitive unless a read comes along (which is pretty rare), but they have to be done for correctness. Couldn't we write both the log entry and the data in parallel and only acknowledge to the client once both commit? If we replay the log without the data we'll know it didn't get committed, and we can collect the data after replay if it's not referenced by the log (I'm speculating,
04/08/2015 Weekly Ceph Performance Meeting IS ON!
8AM PST as usual! Please add an agenda item if there is something you want to talk about. So far we'll be discussing Sandisk's xio messenger performance results, newstore performance, and civetweb vs mod-proxy-fastcgi. Be there or be square! Here's the links: Etherpad URL: http://pad.ceph.com/p/performance_weekly To join the Meeting: https://bluejeans.com/268261044 To join via Browser: https://bluejeans.com/268261044/browser To join with Lync: https://bluejeans.com/268261044/lync To join via Room System: Video Conferencing System: bjn.vc -or- 199.48.152.152 Meeting ID: 268261044 To join via Phone: 1) Dial: +1 408 740 7256 +1 888 240 2560(US Toll Free) +1 408 317 9253(Alternate Number) (see all numbers - http://bluejeans.com/numbers) 2) Enter Conference ID: 268261044 Mark -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Call for jenkins slaves to improve multi operating system support
Loric, do you mean we need give the servers to you or we just build the testing inside our own server room to do all the testing? -jiangang -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Loic Dachary Sent: Wednesday, April 08, 2015 6:56 PM To: Ceph Development Subject: Call for jenkins slaves to improve multi operating system support Hi Ceph, When a contribution is proposed to Ceph [1], a bot compiles and run tests with it to provide feedback to the developer [2]. When something goes wrong the failure can be repeated on the developer machine [3] for debug. This also helps the reviewer who knows the code compiles and does not break anything that would be detected by make check. The bot runs on CentOS 7 and Ubuntu 14.04 only, and problems related to older operating systems (headers, compiler version, etc.) may be detected later, when building packages [4] and after the pull request has been merged in master. This is rare but requires extra attention from the reviewer and needs to be dealt with urgently when it happens. If you can spare a machine to help expand the operating systems on which tests can run, it would be a great help. The minimum hardware configuration to run a slave is: * x86_64 architecture for CentOS 6, Fedora 21, OpenSUSE 13.2, Debian GNU/Linux Jessie, Ubuntu 14.02 32 GB RAM 200 GB SSD 8 core 2.5Ghz * i386 architecture for CentOS 7, CentOS 6, Fedora 21, Debian GNU/Linux Jessie, Ubuntu 14.04, Ubuntu 14.02 4 GB RAM 200 GB disk 2 core * armv7, armv8 architecture for Ubuntu 14.04 4 GB RAM 200 GB disk 2 core Note that since the make check bot can run in a docker container, x86_64 machines can be used to run any of the operating systems for which a docker file has been prepared [5]. Cheers [1] pull requests https://github.com/ceph/ceph/pulls [2] make check bot feedback https://github.com/ceph/ceph/pull/4296#issuecomment-90812064 [3] run-make-check.sh https://github.com/ceph/ceph/blob/master/run-make-check.sh#L44 [4] gitbuilder http://ceph.com/gitbuilder.cgi [5] https://ceph.com/git/?p=ceph.git;a=blob;f=src/test/Makefile.am;hb=hammer#l91 -- Loïc Dachary, Artisan Logiciel Libre
Re: Call for jenkins slaves to improve multi operating system support
On 08/04/2015 15:21, Duan, Jiangang wrote: Loric, do you mean we need give the servers to you or we just build the testing inside our own server room to do all the testing? Thanks for asking, I realize that was not clear. The idea is not to donate hardware, because that would require manpower and extra costs to connect to the net. What would be useful is a machine connected to the net and dedicated to running a jenkins slave. It receives a build from the jenkins master (http://jenkins.ceph.dachary.org/) via ssh (possibly with a tunnel if behind a NAT), clone http://github.com/ceph/ceph, execute the run-make-check.sh script that is found at the root of the repository and reports failure / success back to the jenkins master. Does that make sense ? -jiangang -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Loic Dachary Sent: Wednesday, April 08, 2015 6:56 PM To: Ceph Development Subject: Call for jenkins slaves to improve multi operating system support Hi Ceph, When a contribution is proposed to Ceph [1], a bot compiles and run tests with it to provide feedback to the developer [2]. When something goes wrong the failure can be repeated on the developer machine [3] for debug. This also helps the reviewer who knows the code compiles and does not break anything that would be detected by make check. The bot runs on CentOS 7 and Ubuntu 14.04 only, and problems related to older operating systems (headers, compiler version, etc.) may be detected later, when building packages [4] and after the pull request has been merged in master. This is rare but requires extra attention from the reviewer and needs to be dealt with urgently when it happens. If you can spare a machine to help expand the operating systems on which tests can run, it would be a great help. The minimum hardware configuration to run a slave is: * x86_64 architecture for CentOS 6, Fedora 21, OpenSUSE 13.2, Debian GNU/Linux Jessie, Ubuntu 14.02 32 GB RAM 200 GB SSD 8 core 2.5Ghz * i386 architecture for CentOS 7, CentOS 6, Fedora 21, Debian GNU/Linux Jessie, Ubuntu 14.04, Ubuntu 14.02 4 GB RAM 200 GB disk 2 core * armv7, armv8 architecture for Ubuntu 14.04 4 GB RAM 200 GB disk 2 core Note that since the make check bot can run in a docker container, x86_64 machines can be used to run any of the operating systems for which a docker file has been prepared [5]. Cheers [1] pull requests https://github.com/ceph/ceph/pulls [2] make check bot feedback https://github.com/ceph/ceph/pull/4296#issuecomment-90812064 [3] run-make-check.sh https://github.com/ceph/ceph/blob/master/run-make-check.sh#L44 [4] gitbuilder http://ceph.com/gitbuilder.cgi [5] https://ceph.com/git/?p=ceph.git;a=blob;f=src/test/Makefile.am;hb=hammer#l91 -- Loïc Dachary, Artisan Logiciel Libre -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature
Re: Initial newstore vs filestore results
On Wed, Apr 8, 2015 at 12:49 PM, Sage Weil s...@newdream.net wrote: On Wed, 8 Apr 2015, Haomai Wang wrote: On Wed, Apr 8, 2015 at 10:58 AM, Sage Weil s...@newdream.net wrote: On Tue, 7 Apr 2015, Mark Nelson wrote: On 04/07/2015 02:16 PM, Mark Nelson wrote: On 04/07/2015 09:57 AM, Mark Nelson wrote: Hi Guys, I ran some quick tests on Sage's newstore branch. So far given that this is a prototype, things are looking pretty good imho. The 4MB object rados bench read/write and small read performance looks especially good. Keep in mind that this is not using the SSD journals in any way, so 640MB/s sequential writes is actually really good compared to filestore without SSD journals. small write performance appears to be fairly bad, especially in the RBD case where it's small writes to larger objects. I'm going to sit down and see if I can figure out what's going on. It's bad enough that I suspect there's just something odd going on. Mark Seekwatcher/blktrace graphs of a 4 OSD cluster using newstore for those interested: http://nhm.ceph.com/newstore/ Interestingly small object write/read performance with 4 OSDs was about 1/3-1/4 the speed of the same cluster with 36 OSDs. Note: Thanks Dan for fixing the directory column width! Mark New fio/librbd results using Sage's latest code that attempts to keep small overwrite extents in the db. This is 4 OSD so not directly comparable to the 36 OSD tests above, but does include seekwatcher graphs. Results in MB/s: write readrandw randr 4MB 57.9319.6 55.2285.9 128KB 2.5 230.6 2.4 125.4 4KB 0.4655.65 1.113.56 What would be very interesting would be to see the 4KB performance with the defaults (newstore overlay max = 32) vs overlays disabled (newstore overlay max = 0) and see if/how much it is helping. The latest branch also has open-by-handle. It's on by default (newstore open by handle = true). I think for most workloads it won't be very noticeable... I think there are two questions we need to answer though: 1) Does it have any impact on a creation workload (say, 4kb objects). It shouldn't, but we should confirm. 2) Does it impact small object random reads with a cold cache. I think to see the effect we'll probably need to pile a ton of objects into the store, drop caches, and then do random reads. In the best case the effect will be small, but hopefully noticeable: we should go from a directory lookup (1+ seeks) + inode lookup (1+ seek) + data read, to inode lookup (1+ seek) + data read. So, 3 - 2 seeks best case? I'm not really sure what XFS is doing under the covers here.. WOW, it's really a cool implementation beyond my original mind according to blueprint. Handler, overlay_map and data_map looks so flexible and make small io cheaper in theory. Now we only have 1 element in data_map and I'm not sure your goal about the future's usage. Although I have a unclearly idea that it could enhance the role of NewStore and make local filesystem just as a block space allocator. Let NewStore own a variable of FTL(File Translation Layer), so many cool features could be added. What's your idea about data_map? Exactly, that is one option. The other is that we'd treat the data_map similar to overlay_map with a fixed or max extent size so that a large partial overwrite will mostly go to a new file instead of doing the slow WAL. My concern currently still is WAL after fsync and kv commiting, maybe fsync process is just fine because mostly we won't meet this case in rbd. But submit sync kv transaction isn't a low latency job I think, maybe we could let WAL parallel with kv commiting?(yes, I really concern the latency of one op :-) ) The WAL has to come after kv commit. But the fsync after the wal completion sucks, especially since we are always dispatching a single fsync at a time so it's kind of worst-case seek behavior. We could throw these into another parallel fsync queue so that the fs can batch them up, but I'm not sure we will enough parallelism. What would really be nice is a batch fsync syscall, but in leiu of that maybe we wait until we have a bunch of fsyncs pending and then throw them at the kernel together in a bunch of threads? Not sure. These aren't normally time sensitive unless a read comes along (which is pretty rare), but they have to be done for correctness. Then from the actual rados write op, it will add setattr and omap_setkeys ops. Current NewStore looks plays badly for setattr. It always encode all xattrs(and other not so tiny fields) and write again (Is this true?) though it could batch multi transaction's onode write in short time. Yeah, this could be optimized so that we only unpack and repack the bufferlist, or do a single walk through the buffer to do the updates (similar to what TMAP
Re: Initial newstore vs filestore results
On Wed, 8 Apr 2015, Haomai Wang wrote: On Wed, Apr 8, 2015 at 10:58 AM, Sage Weil s...@newdream.net wrote: On Tue, 7 Apr 2015, Mark Nelson wrote: On 04/07/2015 02:16 PM, Mark Nelson wrote: On 04/07/2015 09:57 AM, Mark Nelson wrote: Hi Guys, I ran some quick tests on Sage's newstore branch. So far given that this is a prototype, things are looking pretty good imho. The 4MB object rados bench read/write and small read performance looks especially good. Keep in mind that this is not using the SSD journals in any way, so 640MB/s sequential writes is actually really good compared to filestore without SSD journals. small write performance appears to be fairly bad, especially in the RBD case where it's small writes to larger objects. I'm going to sit down and see if I can figure out what's going on. It's bad enough that I suspect there's just something odd going on. Mark Seekwatcher/blktrace graphs of a 4 OSD cluster using newstore for those interested: http://nhm.ceph.com/newstore/ Interestingly small object write/read performance with 4 OSDs was about 1/3-1/4 the speed of the same cluster with 36 OSDs. Note: Thanks Dan for fixing the directory column width! Mark New fio/librbd results using Sage's latest code that attempts to keep small overwrite extents in the db. This is 4 OSD so not directly comparable to the 36 OSD tests above, but does include seekwatcher graphs. Results in MB/s: write readrandw randr 4MB 57.9319.6 55.2285.9 128KB 2.5 230.6 2.4 125.4 4KB 0.4655.65 1.113.56 What would be very interesting would be to see the 4KB performance with the defaults (newstore overlay max = 32) vs overlays disabled (newstore overlay max = 0) and see if/how much it is helping. The latest branch also has open-by-handle. It's on by default (newstore open by handle = true). I think for most workloads it won't be very noticeable... I think there are two questions we need to answer though: 1) Does it have any impact on a creation workload (say, 4kb objects). It shouldn't, but we should confirm. 2) Does it impact small object random reads with a cold cache. I think to see the effect we'll probably need to pile a ton of objects into the store, drop caches, and then do random reads. In the best case the effect will be small, but hopefully noticeable: we should go from a directory lookup (1+ seeks) + inode lookup (1+ seek) + data read, to inode lookup (1+ seek) + data read. So, 3 - 2 seeks best case? I'm not really sure what XFS is doing under the covers here.. WOW, it's really a cool implementation beyond my original mind according to blueprint. Handler, overlay_map and data_map looks so flexible and make small io cheaper in theory. Now we only have 1 element in data_map and I'm not sure your goal about the future's usage. Although I have a unclearly idea that it could enhance the role of NewStore and make local filesystem just as a block space allocator. Let NewStore own a variable of FTL(File Translation Layer), so many cool features could be added. What's your idea about data_map? Exactly, that is one option. The other is that we'd treat the data_map similar to overlay_map with a fixed or max extent size so that a large partial overwrite will mostly go to a new file instead of doing the slow WAL. My concern currently still is WAL after fsync and kv commiting, maybe fsync process is just fine because mostly we won't meet this case in rbd. But submit sync kv transaction isn't a low latency job I think, maybe we could let WAL parallel with kv commiting?(yes, I really concern the latency of one op :-) ) The WAL has to come after kv commit. But the fsync after the wal completion sucks, especially since we are always dispatching a single fsync at a time so it's kind of worst-case seek behavior. We could throw these into another parallel fsync queue so that the fs can batch them up, but I'm not sure we will enough parallelism. What would really be nice is a batch fsync syscall, but in leiu of that maybe we wait until we have a bunch of fsyncs pending and then throw them at the kernel together in a bunch of threads? Not sure. These aren't normally time sensitive unless a read comes along (which is pretty rare), but they have to be done for correctness. Then from the actual rados write op, it will add setattr and omap_setkeys ops. Current NewStore looks plays badly for setattr. It always encode all xattrs(and other not so tiny fields) and write again (Is this true?) though it could batch multi transaction's onode write in short time. Yeah, this could be optimized so that we only unpack and repack the bufferlist, or do a single walk through the buffer to do the updates (similar to what TMAP used to do). NewStore also employ much more workload to KeyValueDB compared
Re: Call for jenkins slaves to improve multi operating system support
On Wed, 8 Apr 2015, Loic Dachary wrote: Hi Ceph, When a contribution is proposed to Ceph [1], a bot compiles and run tests with it to provide feedback to the developer [2]. When something goes wrong the failure can be repeated on the developer machine [3] for debug. This also helps the reviewer who knows the code compiles and does not break anything that would be detected by make check. The bot runs on CentOS 7 and Ubuntu 14.04 only, and problems related to older operating systems (headers, compiler version, etc.) may be detected later, when building packages [4] and after the pull request has been merged in master. This is rare but requires extra attention from the reviewer and needs to be dealt with urgently when it happens. Do additional slaves block the message from appearing on the pull request? I.e., what happens if a slave is very slow (e.g., armv7) or broken (network issue)? What are the connectivity requirements? Can slaves exist on other (private) networks? Thanks! sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: lttng broken on i386 trusty
Ah, I didn't notice Loic disabled it for the tarball gitbuilder (it makes sense to disable there since it's not needed, and speeds up build + tests that that gitbuilder does). I'm not sure why it's failing --with-lttng yet. Odd that it doesn't fail in the deb gitbuilder, which still has lttng autodetected. Josh On 04/08/2015 04:25 PM, kernel neophyte wrote: Thats because its being run with --without-lttng as per the log in http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=21f60a9d26f821ba1cd1db8bb79f8aff2a028582: ./configure --with-debug --with-radosgw --with-libatomic-ops --without-lttng --disable-static --without-cryptopp --with-tcmalloc if you change it to --with-lttng it will fail -Neo On Wed, Apr 8, 2015 at 4:20 PM, Josh Durgin jdur...@redhat.com wrote: Are you still seeing the same error on the latest master? The gitbuilder is building successfully now: http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=21f60a9d26f821ba1cd1db8bb79f8aff2a028582 On 04/08/2015 04:07 PM, kernel neophyte wrote: Need help! This is still broken! -Neo On Fri, Mar 27, 2015 at 2:55 AM, Loic Dachary l...@dachary.org wrote: Hi, In case you did not notice: http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=dca722ec7b2a7fc9214844ec92310074b5cb2faa CXXLD radosgw-admin CXXLD rados /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so: undefined reference to `dlclose' /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so: undefined reference to `dlsym' /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so: undefined reference to `dlopen' error: collect2: ld returned 1 exit status make[3]: *** [ceph_tpbench] Error 1 make[3]: *** Waiting for unfinished jobs CXXLD ceph-objectstore-tool make[3]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src' make[1]: *** [all] Error 2 make[1]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src' make: *** [all-recursive] Error 1 Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ms_crc_data false
With 0.93, I tried ceph tell 'osd.*' injectargs '--ms_crc_data=false' '--ms_crc_header=false' and saw the changes reflected in ceph admin-daemon But having done that, perf top still shows time being spent in crc32 routines. Is there some other parameter that needs changing? -- Tom Deneau -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: lttng broken on i386 trusty
Are you still seeing the same error on the latest master? The gitbuilder is building successfully now: http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=21f60a9d26f821ba1cd1db8bb79f8aff2a028582 On 04/08/2015 04:07 PM, kernel neophyte wrote: Need help! This is still broken! -Neo On Fri, Mar 27, 2015 at 2:55 AM, Loic Dachary l...@dachary.org wrote: Hi, In case you did not notice: http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=dca722ec7b2a7fc9214844ec92310074b5cb2faa CXXLD radosgw-admin CXXLD rados /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so: undefined reference to `dlclose' /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so: undefined reference to `dlsym' /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so: undefined reference to `dlopen' error: collect2: ld returned 1 exit status make[3]: *** [ceph_tpbench] Error 1 make[3]: *** Waiting for unfinished jobs CXXLD ceph-objectstore-tool make[3]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src' make[1]: *** [all] Error 2 make[1]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src' make: *** [all-recursive] Error 1 Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: lttng broken on i386 trusty
Thats because its being run with --without-lttng as per the log in http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=21f60a9d26f821ba1cd1db8bb79f8aff2a028582: ./configure --with-debug --with-radosgw --with-libatomic-ops --without-lttng --disable-static --without-cryptopp --with-tcmalloc if you change it to --with-lttng it will fail -Neo On Wed, Apr 8, 2015 at 4:20 PM, Josh Durgin jdur...@redhat.com wrote: Are you still seeing the same error on the latest master? The gitbuilder is building successfully now: http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=21f60a9d26f821ba1cd1db8bb79f8aff2a028582 On 04/08/2015 04:07 PM, kernel neophyte wrote: Need help! This is still broken! -Neo On Fri, Mar 27, 2015 at 2:55 AM, Loic Dachary l...@dachary.org wrote: Hi, In case you did not notice: http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=dca722ec7b2a7fc9214844ec92310074b5cb2faa CXXLD radosgw-admin CXXLD rados /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so: undefined reference to `dlclose' /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so: undefined reference to `dlsym' /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so: undefined reference to `dlopen' error: collect2: ld returned 1 exit status make[3]: *** [ceph_tpbench] Error 1 make[3]: *** Waiting for unfinished jobs CXXLD ceph-objectstore-tool make[3]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src' make[1]: *** [all] Error 2 make[1]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src' make: *** [all-recursive] Error 1 Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ms_crc_data false
On Wed, Apr 8, 2015 at 3:38 PM, Deneau, Tom tom.den...@amd.com wrote: With 0.93, I tried ceph tell 'osd.*' injectargs '--ms_crc_data=false' '--ms_crc_header=false' and saw the changes reflected in ceph admin-daemon But having done that, perf top still shows time being spent in crc32 routines. Is there some other parameter that needs changing? You can change this config value, but unfortunately it won't have any effect on a running daemon. You'll need to specify it in a config and restart. -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Initial newstore vs filestore results
On 04/07/2015 09:58 PM, Sage Weil wrote: On Tue, 7 Apr 2015, Mark Nelson wrote: On 04/07/2015 02:16 PM, Mark Nelson wrote: On 04/07/2015 09:57 AM, Mark Nelson wrote: Hi Guys, I ran some quick tests on Sage's newstore branch. So far given that this is a prototype, things are looking pretty good imho. The 4MB object rados bench read/write and small read performance looks especially good. Keep in mind that this is not using the SSD journals in any way, so 640MB/s sequential writes is actually really good compared to filestore without SSD journals. small write performance appears to be fairly bad, especially in the RBD case where it's small writes to larger objects. I'm going to sit down and see if I can figure out what's going on. It's bad enough that I suspect there's just something odd going on. Mark Seekwatcher/blktrace graphs of a 4 OSD cluster using newstore for those interested: http://nhm.ceph.com/newstore/ Interestingly small object write/read performance with 4 OSDs was about 1/3-1/4 the speed of the same cluster with 36 OSDs. Note: Thanks Dan for fixing the directory column width! Mark New fio/librbd results using Sage's latest code that attempts to keep small overwrite extents in the db. This is 4 OSD so not directly comparable to the 36 OSD tests above, but does include seekwatcher graphs. Results in MB/s: write readrandw randr 4MB 57.9319.6 55.2285.9 128KB 2.5 230.6 2.4 125.4 4KB 0.4655.65 1.113.56 What would be very interesting would be to see the 4KB performance with the defaults (newstore overlay max = 32) vs overlays disabled (newstore overlay max = 0) and see if/how much it is helping. And here we go. 1 OSD, 1X replication. 16GB RBD volume. 4MB write readrandw randr default overlay 36.13 106.61 34.49 92.69 no overlay 36.29 105.61 34.49 93.55 128KB write readrandw randr default overlay 1.7197.90 1.6525.79 no overlay 1.7297.80 1.6625.78 4KB write readrandw randr default overlay 0.4061.88 1.291.11 no overlay 0.0561.26 0.051.10 seekwatcher movies generating now, but I'm going to bed soon so I'll have to wait until tomorrow morning to post them. :) The latest branch also has open-by-handle. It's on by default (newstore open by handle = true). I think for most workloads it won't be very noticeable... I think there are two questions we need to answer though: 1) Does it have any impact on a creation workload (say, 4kb objects). It shouldn't, but we should confirm. 2) Does it impact small object random reads with a cold cache. I think to see the effect we'll probably need to pile a ton of objects into the store, drop caches, and then do random reads. In the best case the effect will be small, but hopefully noticeable: we should go from a directory lookup (1+ seeks) + inode lookup (1+ seek) + data read, to inode lookup (1+ seek) + data read. So, 3 - 2 seeks best case? I'm not really sure what XFS is doing under the covers here... sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: lttng broken on i386 trusty
Strange! I see the build succeeds for deb -Neo On Wed, Apr 8, 2015 at 5:00 PM, Josh Durgin jdur...@redhat.com wrote: Ah, I didn't notice Loic disabled it for the tarball gitbuilder (it makes sense to disable there since it's not needed, and speeds up build + tests that that gitbuilder does). I'm not sure why it's failing --with-lttng yet. Odd that it doesn't fail in the deb gitbuilder, which still has lttng autodetected. Josh On 04/08/2015 04:25 PM, kernel neophyte wrote: Thats because its being run with --without-lttng as per the log in http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=21f60a9d26f821ba1cd1db8bb79f8aff2a028582: ./configure --with-debug --with-radosgw --with-libatomic-ops --without-lttng --disable-static --without-cryptopp --with-tcmalloc if you change it to --with-lttng it will fail -Neo On Wed, Apr 8, 2015 at 4:20 PM, Josh Durgin jdur...@redhat.com wrote: Are you still seeing the same error on the latest master? The gitbuilder is building successfully now: http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=21f60a9d26f821ba1cd1db8bb79f8aff2a028582 On 04/08/2015 04:07 PM, kernel neophyte wrote: Need help! This is still broken! -Neo On Fri, Mar 27, 2015 at 2:55 AM, Loic Dachary l...@dachary.org wrote: Hi, In case you did not notice: http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=dca722ec7b2a7fc9214844ec92310074b5cb2faa CXXLD radosgw-admin CXXLD rados /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so: undefined reference to `dlclose' /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so: undefined reference to `dlsym' /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so: undefined reference to `dlopen' error: collect2: ld returned 1 exit status make[3]: *** [ceph_tpbench] Error 1 make[3]: *** Waiting for unfinished jobs CXXLD ceph-objectstore-tool make[3]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src' make[1]: *** [all] Error 2 make[1]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src' make: *** [all-recursive] Error 1 Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: lttng broken on i386 trusty
Need help! This is still broken! -Neo On Fri, Mar 27, 2015 at 2:55 AM, Loic Dachary l...@dachary.org wrote: Hi, In case you did not notice: http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-i386-basic/log.cgi?log=dca722ec7b2a7fc9214844ec92310074b5cb2faa CXXLD radosgw-admin CXXLD rados /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so: undefined reference to `dlclose' /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so: undefined reference to `dlsym' /usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/liblttng-ust.so: undefined reference to `dlopen' error: collect2: ld returned 1 exit status make[3]: *** [ceph_tpbench] Error 1 make[3]: *** Waiting for unfinished jobs CXXLD ceph-objectstore-tool make[3]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src' make[1]: *** [all] Error 2 make[1]: Leaving directory `/srv/autobuild-ceph/gitbuilder.git/build/src' make: *** [all-recursive] Error 1 Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: ms_crc_data false
-Original Message- From: Sage Weil [mailto:s...@newdream.net] Sent: Wednesday, April 08, 2015 5:40 PM To: Deneau, Tom Cc: ceph-devel Subject: Re: ms_crc_data false On Wed, 8 Apr 2015, Deneau, Tom wrote: With 0.93, I tried ceph tell 'osd.*' injectargs '--ms_crc_data=false' '-- ms_crc_header=false' and saw the changes reflected in ceph admin-daemon But having done that, perf top still shows time being spent in crc32 routines. Is there some other parameter that needs changing? The osd still does a CRC for the purposes of write journaling. This can't be disabled currently. You shouldn't see this come up on reads... sage Hmm, I am doing rados bench seq so only reads. Is this in 0.93 or do I need 0.94? -- Tom -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcmalloc issue
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 01/04/15 18:54, Somnath Roy wrote: the tcmalloc bug is reported in the following link. http://code.google.com/p/gperftools/issues/detail?id=585 So, do you want us to give the steps on how we are hitting the following trace or to reproduce the bug reported in the above link ? If you are asking about hitting the stack trace I mentioned below, it is just running ceph with lot of IOs for longer period of time. A minimal test case on how to reproduce the bug is sufficient; we can of course complete the second activity as well as soon as we have the update available. Cheers James - -- James Page Ubuntu and Debian Developer james.p...@ubuntu.com jamesp...@debian.org -BEGIN PGP SIGNATURE- Version: GnuPG v2 iQIcBAEBCAAGBQJVJhUoAAoJEL/srsug59jD/WoQAJqZYJ82xcDDxuD5C9RnUXha 2UtYNXZTK9O1NdiLqsNw14SPKtOQ6IDyfJjvB2v+ddfvRxhlfSM6LmC4ERa0t0SW a3tfwx9fQ0WEz2r35nkupHwo+Z4yfTAuHgGom3rH6tq4gbAInn7J+4unsxghV1kf op/c9qNJZV9jqfqMZFFgwwptK7rJ4mKJDvGIRU4ddc9HEKnqukJHQAHh3zUlNXC9 6y11RAHFh0FcgEEHys3o/ui0JVzNjmP8HJXISzyWSqO6J2NxqRNYoKGTk81zBvrf IkWDDujCcTQoAU2aW6ERRvG1Dkzc9oTLdcmb0gGCKKpqa1m1GjnqS1B39rjheXio HK5L9HO11Fs+9uONBXWvI6P0GXi3UQoe1NaVQB85EL+NPFbTdKr9zw8nOyyqZ9Y5 X55uVRAscLV/jb1vYIU5JxSNWof4BtPSjQzq29cithPqfWyF2N+eyQbCrAaIu1oP QGSnryCsoWDARyvjMIFxcsd2X6nFnOd9DMgumi6O1ksKAe1P38k1iUWFwXXutby6 6eTdZtOj9Wc4VNRPb8SYBkBSlU+dCgRVMlNUImcCKWR6w8+Y9BbQU7OwQGEzj9Xw f9sdi+4JA04l/BXdAtlVwSj4uSm2Q6R/YK4szpRGs24EOV6dX+aCMK5ezBIliPo0 D/7ZD03rXkiBzHuBqiay =066U -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html