Questions about journals, performance and disk utilization.

martin Tue, 22 Jan 2013 12:08:59 -0800

Hi list,

In a mixed SSD & SATA setup (5 or 8 nodes each holding 8x SATA and 4xSSD) would it make sense to skip having journals on SSD or is theadvantage of doing so just too great? We're looking into having 2 pools,sata and ssd and will be creating guests belonging into either of thesegroups based on if they require high/heavy io.

Also, we currently lean on going with a very simple setup using aserverboard with 8x onboard raid slots (LSI 2308) and 6x onboard sataslots and just attach all disks to both onboard controller and onboardslots (for cost and simplicity) - and just pass them along as JBOD.


Any suggestions/input about:

- Would it make sense to drop onboard controller and aim for a bettercontroller (cache/battery backed 12-16 port one)

- Attach another cheapo JBOD card like SAS2008/LSI 2308 etc.
- or just go with this setup (to keep it simpler and cheaper)

Journals:

- Would it make sense to kill say 1 ssd and 1 sata and attach 2 fastSSD for journals? Or would that be 'redundant' in our case since wealready have a pool with sata and ssd (we do not expect heavy io in thesata pool)


Rbd striping:

- Performance - afaik rbd is striped over objects; if one would createsay a 20GB rbd image would this mostly be striped over very fewobjects/pg (say ~3 nodes as would be min. in our setup) or would oneexpect it to be striped over pretty much the entirety of the nodes (5 or8 in our case) in smaller objects (or even across all OSD?)


Disks:

- Any advice for SATA disks? I know a vendor like Seagate have their'normal' enterprise disks (ES.3-models) and are also selling theircloud-based disks (CS models). Any suggestions/experience what to lookat/aim at? Or what are people using in general?


Disk utilization:

- I've noticed in our testsetup that we have several pg's taking up>300GB data each - is this normal? This results in some odd situationswhere disk usage can vary by up to 15-20% (2TB disks). If we adjust theweight it eventually means one of these pg will go to another disk andit has to copy 300GB data. We're using 0.56.1.


Some output from 'ceph pg dump':

pg_stat objects mip degr unf bytes log disklog statestate_stamp v reported up acting last_scrubscrub_stamp last_deep_scrub deep_scrub_stamp4.5 90772 0 0 0 379301388412 150969 150969active+clean 2013-01-22 00:07:13.384272 2827'4124142795'3317565 [1,2] [1,2] 2827'397587 2013-01-2200:07:13.384225 2744'299767 2013-01-17 05:40:40.737279


Results in disk usage like:

Filesystem Size UsedAvail Use% Mounted on/dev/sdd1 1.9T 1.4T446G 77% /srv/ceph/osd5/dev/sdb1 1.4T 1.1T331G 77% /srv/ceph/osd0/dev/sda1 1.9T 1.4T442G 77% /srv/ceph/osd1/dev/sdc1 1.9T 1.8T84G 96% /srv/ceph/osd2

If we reweight sdc down (even with 0.00X % at a time) one of those bigpg's will eventually move to any one of the above disks and the imagewill look exactly the same with the exception another disk will have 96%usage instead (I've bumped cluster full % to 98% in this setup).

Apologies up front if questions like these are not supposed to go tothis mailling-list.


Any advice/ideas/suggestions are very welcome!

Cheers,
Martin Nielsen
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Questions about journals, performance and disk utilization.

Reply via email to