This may be somewhat controversial, so I’ll try to tread lightly. Might we infer that your OSDs are on spinners? And at 500 GB it would seem likely that they and the servers are old? Please share hardware details and OS.
Having suffered an “enterprise” dogfood deployment in which I had to attempt to support thousands of RBD clients on spinners with colo journals (and a serious design flaw that some of you are familiar with), my knee-jerk thought is that they are antithetical to “heavy use of block storage”. I understand though that in an education setting you may not have choices. How highly utilized are your OSD drives? Depending on your workload you *might* benefit with more PGs. But since you describe your OSDs as being 500GB on average, I have to ask: do their sizes vary considerably? If so, larger OSDs are going to have more PGs (and thus receive more workload) than smaller. “ceph osd df” will show the number of PGs on each. If you do have a significant disparity of drive sizes, careful enabling and tweaking of primary affinity can have measureable results in read performance. Is the number of PGs a power of 2? If not, some of your PGs will be much larger than others. Do you have OSD fillage reasonably well balanced? If “ceph osd df” shows a wide variance, this can also hamper performance as the workload will not be spread evenly. With all due respect to those who have tighter constraints than I enjoy in my my current corporate setting, heavy RBD usage on spinners can be sisyphean. Granted I’ve never run with a cache tier myself, or with separate WAL/DB devices. In a corporate setting the additional cost of SSD OSDs can easily be balanced by reduced administrative hassle and user experience. If that isn’t an option for you anytime soon, then by all means I’d stick with the cache tier, and maybe with Luminous indefinitely. _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com