hi Justin,

how many vblades are you exporting here? Do each of the 15 clients have
a "blade" each? More than one on each?  Or just one big one shared?

You mentioned partitioning and aio so I'm assuming more than one, but
it'd be good to know.

+1 for the RAID6 support :)

John.


On Wed, 2008-05-14 at 01:01 -0500, Justin C. Darby wrote:
> Hi list,
> 
> Does anyone have interest in my spending time fully documenting our
> vblade-based AoE configuration? Here are the basics:
> 
> Drives: 48 Seagate ES.2 7200-rpm disks on two 3ware 9650SE-24M8
> controllers, two RAID 6 volumes, four hot spares, and LVM to
> partition/stripe it up.
> 
> Network: Nortel 10GbE switch for IBM bladecenter.
> 
> Server: Four, dual core 2.0ghz AMD opterons, 16gb RAM. NUMA and
> realtime kernel options (2.6.24.x). 2x Intel 10Gbase-SR adapters w/
> multiqueue kernel support.
> 
> Clients: 15 10GbE, mixed Intel (stand alone, mtu of 9000) and Netxen
> (bladecenter, max supported mtu of 8000).
> 
> Performance (total, on extremely optimized kernel, server): Up to
> 16Gb/sec (2GB/sec) at about 80% cpu utilization.
> 
> Performance (netxen clients, packaged distribution kernel w/ latest
> aoe driver): Up to ~3Gb/sec (~380MB/sec) peak average, sporadic bursts
> of 3.5Gb/sec (~450MB/sec).
> 
> Performance (Intel clients w/o multiqueue, packaged distribution
> kernel w/ latest aoe driver): Up to ~5Gb/sec (~630MB/sec) peak.
> 
> Performance (Intel client, testing config.. exact mirror of server OS
> install w/ latest aoe driver): Hits about 8Gb/sec, no problem, no
> effort involved. About ~280,000 packets per second at mtu 9000. Seems
> to be the limit of the Intel NIC's and/or switch (per-port) with that
> MTU size.
> 
> Security layer: VLAN trunking, with the REORDER_HDR flag set on the
> server and clients. (Note: Both these network cards offload the
> rewriting of vlan tags, so this is a pretty low-cpu consuming task for
> us.)
> 
> Some random comments:
> 
> We had a number of frustrating issues with our configuration
> initially, but we've been in production for a couple months now.
> 
> Of note, though, is that we did have to bump the AoE buffer count
> considerably in vblade to get the netxen clients performing well at
> all, so the servers involved would load up the Ethernet port queues on
> our bladecenters 10GbE switch (all of these cards support traffic
> congestion control features and the switch has huge queues, as well).
> They'd also do a lot better, but for some reason, when ext3 updates
> filesystem metadata/journal, it refuses to work very well with the odd
> block/MTU size.
> 
> We also had to tune the heck out of the read ahead and page cache
> system on the server, but thats more to do with 3ware controllers and
> the small read requests vblade generates than vblade itself.
> 
> The largest problem our config has is generating client I/O load. As
> it turns out, it's a lot of work to ask for more than a couple hundred
> megabytes a second of I/O performance in a way the early Linux 2.6 I/O
> schedulers will care about.
> 
> Other notes: Latency has been great. CPU usage has been low on
> clients. The "sporadic bursts" part above is just that.. we have no
> idea why sometimes these cards perform better, the type of work they
> are performing does not change when this happens.
> 
> The new AIO and socketfilter patch will make our environment a little
> more sane even though the vlan isolation thing stops vblade from
> seeing other vlan's aoe broadcasts (multiple exports on one vlan
> become less painful with the socketfilter patch, AIO let's me relax
> the vm.dirty[_background]_ratio tuning a bit), so I'm back into the
> mode of thinking about vblade. I'll probably be testing this soon.
> 
> Also, is there any interest in people using vblade on 10GbE to add a
> command line switch to set the buffer count? I can't imagine we're the
> only people who have run into this. We're probably going to write up a
> patch to do this, since we're going to export to some clients over
> 1GbE after we get the N7K up and running.
> 
> Offtopic pipe dream/note to the guys at Coraid: If you had one device
> like the SR2461 that could do RAID 6+hot spares for the ultra
> paranoid, had a web interface (internal or external to the box) for
> configuration and disk management somewhere near as feature complete
> as 3dm2, did 10GBase-SR (not CX4), etherchannel/802.3ad for
> redundancy, LVM (or LVM like) partitioning, VLANs with VLAN trunking,
> and could be covered under a support contract, I'd be suggesting we
> buy one or two next fiscal year. Far fetched, I know.  But, we're
> ramping up to deploy a Cisco Nexus 7000 series switch over the next
> few months, in part to deal with 10GbE SAN traffic, and I'm not sold
> on FCoE given our awesome AoE setup, and we've got that kind of
> solution working locally, and if I can buy it off the shelf it saves
> me a lot of time, so... :)
> 
> Justin



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Aoetools-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss

Reply via email to