Hi Thomas,
Thanks for the information and suggestions. Robinhood v3 sounds quite exciting
- I will give some thought as to how we might be able to participate in an
early testing phase. I think our timeline probably points toward moving into
production with v2, though. I am by no means a mysql expert. I'd very much
appreciate if you can point me at tuned my.cnf examples.
If I may ask for comment on high-availability using the current Robinhood
releases and our lhsm+tmpfs setup - my intuition is that dual controller,
direct-attached storage and two hosts each running a mysqld and an rbh manager
- one for lhsm and one for tmpfs - would work well; the storage can be two
separate LUNs (probably made from SSDs), one for each database. Then with a
pretty conventional corosync/pacemaker setup, we can have one node ready to
take over the services of the other as required but be running in an
active-active mode in normal operation. Currently, our file system has 220M
inodes in use (growing), which suggests each host having at least 256GB of
memory. Having each LUN be a couple TB or so should allow for future growth
(which we expect). Is this the kind of setup which is generally recommended if
one needs a highly available Robinhood setup? I suspect that alternatives like
mysql master-slave replication or DRBD will impact performance and/or not work
quite as well - is my concern unfounded? FWIW, our cluster currently has ~22k
cores, and we'll be growing to ~50k next year. Our combined
create/open/close/rename/unlink activity is recently averaging in the few
thousand ops/sec, though being in the 10s of thousands ops/sec isn't uncommon.
Thanks again,
Craig Prescott
UF Research Computing
From: LEIBOVICI Thomas [mailto:[email protected]]
Sent: Monday, December 01, 2014 5:13 AM
To: Prescott,Craig P; [email protected]
Subject: Re: [robinhood-support] lhsm custom purge command and DB server sizing
Hi Craig,
It sounds Robinhood v3 would perfectly match you need to manage these multiple
use-cases in a single robinhood instance. This is basically what it is designed
for. This major version is currently under development, and it will likely be
available in 2H2015. Not sure how it fits in your planning...
Depending on your time requirement to setup these use-cases into full
production, you could get an early version of robinhood early 2015 that do not
implement all planned features of rbhv3, but at least implements lhsm archiving
and pool to pool migration in a single instance.
If you agree for such a early testing phase, just let us know.
I believe this answers to your question 1).
2) lhsm and tmpfs can run on the same client, as long as they are registered as
different changelog readers, and access distinct databases.
I'd be concerned about a fight between the 2 instances for realeasing disk
space in ost pools: one will want to run "lhsm release", and the other "lfs
migrate". However, I understood you just want to replicate data with lhsm, but
don't plan to "release" it. So will be OK.
3) A few recommandations:
- Keep your robinhood client as close to the MDS as possible: no access
through LNET routers that would introduce an extra latency for Lustre RPCs.
- I'm not sure running the DB on a different host is a good idea because it
introduces a network latency for robinhood DB requests, whereas they must fire
at tens of thousands per seconde to sustain the filesystem workload.
The robinhood daemon itself doesn't require much memory or CPU, so it can
perfectly live with a DB engine on the same host.
- It is not expected that the database size changes a lot between Lustre 1.8
and 2.5. Most of the space is consumed by namespace management and storage of
stripe information, that are about the same between the 2 versions. Only one or
2 more table fields for Lustre/HSM, so not a lot of additionnal space.
- Tune your /etc/my.cnf (we can provide you examples).
- For the HW:
* It is nice to have most of the DB in memory: 1k/entry is a good sizing
(e.g. 128GB of memory for 128M entries). In the case you need to run 2
robinhood DB on the same host, you'll have to double it.
* Of course, a fast DB backend like a SSD is better than a spinning disk
(some benchmarks here:
https://github.com/cea-hpc/robinhood/wiki/tmpfs_admin_guide#entry-processor-pipeline-options)
Regards,
Thomas
On 11/25/14 20:39, Prescott,Craig P wrote:
Hello,
A few months back we took a look at Robinhood Policy Engine on a Lustre 2.5.x
testbed with eyes toward accomplishing two main goals: a) automatic migration
between OST pools based upon atime without changing the path, and b)
replication of particular data to an external file system. It seemed like
current Robinhood software releases could do this by running a tmpfs manager
with a custom purge command (lfs migrate) to handle the migration between
pools, and an lhsm manager for the replication.
With upgrade and expansion our production file system from 1.8.9 to 2.5 being
planned for the new year (new system), we are considering how to best bring the
above two goals into production. We want have the Robinhood components we need
in place when the new system goes into production, and we want to avoid ever
having to rescan the file system. So I have the following questions I hope I
can get some input on:
1) At the time I was looking, the lhsm manager could not have a custom
purge command (appended). If it could, then we would only need to run the lhsm
manager, which would be ideal - it could handle both of our use cases and we
could have a single changelog reader and a single database. Is this possible
with current software releases?
2) If we need to run both lhsm and tmpfs managers, can they be run
simultaneously from the same client?
3) We intend to use a database running on a different node than the
Robinhood managers. I am not sure how to wisely choose HW for mysqld to keep
up with the changelogs. We can use stats from our current 1.8.9-based file
system and cluster activity to size this, but I'm not sure what to look at.
Any pointers here?
Thanks,
Craig Prescott
UF Research Computing
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
robinhood-support mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/robinhood-support
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support