RE: tool for applying 'ceph daemon ' command to all OSDs
> -Original Message- > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- > ow...@vger.kernel.org] On Behalf Of Dan Mick > Sent: Tuesday, December 22, 2015 7:00 AM > To: ceph-devel > Subject: RFC: tool for applying 'ceph daemon ' command to all OSDs > > I needed something to fetch current config values from all OSDs (sorta the > opposite of 'injectargs --key value), so I hacked it, and then spiffed it up > a bit. > Does this seem like something that would be useful in this form in the > upstream Ceph, or does anyone have any thoughts on its design or > structure? > You could do it using socat too: Node1 has osd.0 Node1: cd /var/run/ceph sudo socat TCP-LISTEN:60100,fork unix-connect:ceph-osd.0.asok Node2: cd /var/run/ceph sudo socat unix-listen:ceph-osd.0.asok,fork TCP:Node1:60100 Node2: sudo ceph daemon osd.0 help | head { "config diff": "dump diff of current config and default config", "config get": "config get : get the config value", This is more for development/test setup. Regards, Igor. > It requires a locally-installed ceph CLI and a ceph.conf that points to the > cluster and any required keyrings. You can also provide it with a YAML file > mapping host to osds if you want to save time collecting that info for a > statically-defined cluster, or if you want just a subset of OSDs. > > https://github.com/dmick/tools/blob/master/osd_daemon_cmd.py > > Excerpt from usage: > > Execute a Ceph osd daemon command on every OSD in a cluster with one > connection to each OSD host. > > Usage: > osd_daemon_cmd [-c CONF] [-u USER] [-f FILE] (COMMAND | -k KEY) > > Options: >-c CONF ceph.conf file to use [default: ./ceph.conf] >-u USER user to connect with ssh >-f FILE get names and osds from yaml >COMMAND command other than "config get" to execute >-k KEYconfig key to retrieve with config get > > -- > Dan Mick > Red Hat, Inc. > Ceph docs: http://ceph.com/docs > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the > body of a message to majord...@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html
[testing ceph] with gnu stow and NFS
H Cephers! TL;DR Use NFS server to share binaries and libs (ceph and any other) among your cluster. Then link them using gnu stow from mounted NFS to the root ( / ) directory on every node. Now switching between your custom ceph builids (or any other e.g. tcmalloc) on whole cluster will be very fast, easy to automate and consistent. Stow will change your ceph version using symbolic links with two commands only. Long version: I want to share with you, one of my concepts that make my life testing ceph a little bit easier. Some time ago I wrote a few words about this in other thread, but You probably missed them, because of heavy discussion in that thread. Main idea was to make easy mechanism to switch between ceph versions: binaries/libs - everything. But the truth was I'm too lazy to reinstall it manually on every host and to ignorant to check if I've installed right version ;) What I have at the moment: - NFS server that exports /home/ceph to all of my cluster nodes - Several subfolders with ceph builds, e.g. /home/ceph/ceph-0.94.1, /home/ceph/git/ceph - and libraries e.g. /home/ceph/tcmalloc/gpertftools-2.4 In /home/ceph/ceph-0.94.1 and /home/ceph/tcmalloc/gpertftools-2.4 I have additional directory called BIN and everything is installed into it by running instead of normal install (or building RPMs): $ make $ make install something like: $ mkdir BIN $ make $ make DESTDIR=$PWD/BIN install $ rm -rf $PWD/BIN/var # in case of Ceph we don't want to share this directory on NFS, so we must remove it DESTDIR will install all package related files into BIN, just like it would be the root ( / ) directory: $ tree BIN BIN ??? etc ? ??? bash_completion.d ? ? ??? ceph ? ? ??? rados ? ? ??? radosgw-admin ? ? ??? rbd ? ??? ceph ??? sbin ? ??? mount.ceph ? ??? mount.fuse.ceph ??? usr ??? bin ? ??? ceph ? ??? ceph-authtool ? ??? ceph_bench_log ? ??? ceph-brag ? ??? ceph-client-debug ? ??? ceph-clsinfo And now it's time for gnu stow: https://www.gnu.org/software/stow/ On every node I run from root: $ stow -d /home/ceph/ceph-0.94.1 -t/ BIN; ldconfig; Stow will create symbolic links from every file/directory from BIN into root ( / ) directory on my Linux, and ceph would work just like I'd make install it normal way, or using rpms. $ type ceph ceph is hashed (/usr/bin/ceph) $ ls -al /usr/bin/ceph $ lrwxrwxrwx 1 root root 50 Dec 11 14:33 /usr/bin/ceph -> ../../home/ceph/ceph-0.94.1/BIN/usr/bin/ceph I can do the same for other libraries as well: $ stow -d /home/ceph/tcmalloc/gperftools-2.4 -t/ BIN; ldconfig; If I need to check another ceph/library version I just stop ceph on all nodes, then "unstow": $ stow -D -d /home/ceph/ceph-0.94.1 -t/ BIN; ldconfig; and "stow" again to different version $ stow -D -d /home/ceph/ceph-0.94.1_my_custom_build -t/ BIN; ldconfig; == Exception == /etc/init.d/ceph should be copied into / because when you "unstow" ceph, "service ceph start" won't be working. Then I just start ceph on all nodes and that's all. Quite fast isn't it? NFS+stow concept could be used not only for "builds" (compilation, make, make install), but from RPMs too (precompiled binaries). You need to unpack RPM into BIN folder and run stow, it will work just like you would install this rpm in a standard way, into root ( / ). Placing binaries/libs on NFS does not impact performance on ceph at runtime, it could in fact cause some delay during processes start, when they are loaded from file system. Of course NFS will be SPOF, but for tests that I made this doesn't matter, I test only application behavior and infrastructure is untouched. This idea is a time-saver at the day, and easy automation during night tests. Regards, Igor. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Scaling Ceph reviews and testing
> -Original Message- > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- > ow...@vger.kernel.org] On Behalf Of Dalek, Piotr > Sent: Thursday, November 26, 2015 9:56 AM > To: Gregory Farnum; ceph-devel > Subject: RE: Scaling Ceph reviews and testing > > > -Original Message- > > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- > > ow...@vger.kernel.org] On Behalf Of Gregory Farnum > > Sent: Wednesday, November 25, 2015 11:14 PM > > > > It has been a long-standing requirement that all code be tested by > > teuthology before being merged to master. In the past leads have > > shouldered a lot of this burden through integration and testing > > branches, but it’s become unsustainable in present form: some PRs > > which are intended as RFCs are being mistakenly identified as final; > > some PRs are submitted which pass cursory sniff tests but fail under > > recovery conditions that the teuthology suites cover. To prevent that, > > please comment on exactly what testing you’ve performed when > > submitting a PR and a justification why that is sufficient to promote > > it to integration testing. [..] > > Unless people will be convinced that performing their own testing isn't that > complex (teuthology-openstack is a leapfrog in right direction), they won't do > it, either because they simply don't know how to do it, or they don't have > resources to do so (small startups may not afford them at all, and large, > global corporations might have hardware request procedures so complex > and with a such large time span, that it scares the devs out). > But correctness and reliability regressions are one thing, performance > regressions are another one. I already see PRs that promise performance > increase, when at (my) first glance it looks totally contradictory, or it's > just a > big, 100+ line change which adds more complexity than performance. Not to > mention utter nonsense like https://github.com/ceph/ceph/pull/6582 > (excuse my finger-pointing, but this case is so extreme that it needs to be > pointed out). Or, to put it more bluntly, some folks are spamming with > performance PRs that in their opinion improve something, while in reality > those PRs at best increase complexity of already complex code and add > (sometimes potential) bugs, often with added bonus of actually degraded > performance. So, my proposition is to postpone QA'ing performance pull > requests until someone unrelated to PR author (or even author's company) > can confirm that claims in that particular PR are true. Providing code snippet > that shows the perf difference (or provide a way to verify those claims in > reproducible matter) in PR should be enough for it > (https://github.com/XinzeChi/ceph/commit/2c8a17560a797b316520cb689240 > d4dcecf3e4cc for a particular example), and it should help get rid of > performance PRs that degrade performance or improve it only on particular > hardware/software configuration and at best don't improve anything > otherwise. > > > With best regards / Pozdrawiam > Piotr Dałek We could also add another label, like "explanation/data needed" and guys marking new PR's could add this to make this more restrict: "Performance enhancements must come with test data and detailed explanations." (https://github.com/ceph/ceph/blob/master/CONTRIBUTING.rst ) Then Piotr's idea will be easier to do, when "PR validator" will have test data and explanation he could faster/easier decide if this PR make sense or not. Regards, Igor. > > 칻 & ~ & +- ݶ w ˛ m ^ b ^n r z h& G h ( 階 ݢj" > m z > ޖ f h ~ m
FW: ceph_monitor - monitor your cluster with parallel python
Hey, one more time here, I’ve got reject from mail daemon. Regards, Igor. From: Podoski, Igor Sent: Thursday, November 19, 2015 8:53 AM To: ceph-devel; 'ceph-us...@ceph.com' Subject: ceph_monitor - monitor your cluster with parallel python Hi Cephers! I’ve created small tool to help track memory/cpu/io usage. It’s useful for me so I thought I could share with you: https://github.com/aiicore/ceph_monitor In general this is a python script, that uses parallel python to run a function on remote host. Data is gathered from all hosts and presented on console or added to sqlite database, then can be plotted with e.g. gnuplot. You can define osd ranges, that you want to monitor, or monitor certain process, e.g. osds only from pool that has ssds. The main concept is that monitor don’t know and don’t care on which host osd’s are running, it treats them as a whole set. Script uses psutil to get data related to processes (mon/osd/rgw/whatever). In near feature I’d like to add modes that can modify process behavior e.g. psutil has .nice .ionice .cpu_affinity methods, that could be useful in some tests. Basically with parallel python you can run any function remotely, so tuning SO by changing some /proc/* files can be done too. You can add labels to data to see when what happens. Sample plot: https://raw.githubusercontent.com/aiicore/ceph_monitor/master/examples/avg_cpu_mem.png Simple test: https://github.com/aiicore/ceph_monitor/blob/master/examples/example_test_with_rados.sh Short readme: https://github.com/aiicore/ceph_monitor Full readme: https://github.com/aiicore/ceph_monitor/blob/master/readme.txt I encourage You to use and develop it, if not just please read the full readme text, maybe you’ll come up with a better idea based on my this concept and something interesting will happen. p.s. This currently works with python 2.6 and psutil 0.6.1 on centos 6.6. If you find any bug – report it on my github as an issue. !!! Security notice !!! Parallel python supports SHA authentication – my version currently runs WITHOUT this so in certain environments it could be dangerous (you could run any function from untrusted client). For now use it only in test/dev isolated clusters. Regards, Igor. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html