> The design involves a technology called "Express Ether" though it is
> typically written as "ExpEther," and it is basically a way to run a
> PCIe bus over ethernet. Though this might be the first you've heard of
> it, ExpEther has been in development at NEC for the last five years,
> and yes, I'm currently working on getting the documentation released for
> the existing silicon.

Getting these docs would be kick ass.  I was vaguely aware that this was
happening and what I don't know is how the silicon works or looks like.
Got any free docs on that?

> 
> http://www.nec.co.jp/press/en/0702/0801.html
> http://www.expether.org/
> 
> In short, you can think of ExpEther as something between a bus extender
> and a bridge (PCIe<->ethernet), so basically anything you can plug into
> a PCIe slot can be made available to a remote machine. Yep, you can
> even partition attached devices into VLANs and basically "build" a
> computer on the fly out of available parts attached to the network. For
> example if your VPN or secure website is running a little slow, you
> would usually halt the machine and add a crypto accelerator, but with
> ExpEther, you just export a crypto accelerator device on another system
> to the system that needs it and the recipient system assumes the device
> is attached to it's local PCIe bus.

So this is where all the work comes in.  We need a new pci bridge (or
bus) device that does all the magic.  Once this is in place one could
trivially hook hardware up and make it work regardless of distance
(latency would have to be considered obviously).

I am a little confused here though; if this is done right it should be
transparent to the OS and no code would have to be written at all (minus
management obviously).  Why do we need code?

> One of the first applications I'm working on is exporting a softraid
> volume over ExpEther. I was asked if it was possible to build a shim
> that makes a block device like a softraid sd0a look like an ATA device
> sitting on a (fictitious) ATA controller on the PCIe bus?

Sure it could easily be used for that however if you want to make this
much more usable see my previous paragraph.  You really want to solve
the problem only once and not multiple times.

> Though it's certainly an uncommon thing to try to do, there's just
> something about this approach that makes me wonder if it's a
> crazy/stupid idea, or absolutely brilliant?

Fine hack to prove a concept however a pci bridge (or bus) is the device
you really need and should write.

> To *me* (complete idiot), I'm wondering if this is being approached at
> the wrong level, namely shimming a block device like sd0a to be seen as
> a ata/scsi device on a fictitious controller, versus shimming something
> below it, i.e. 
>       scsibus0 at softraid0 
>       sd0 at scsibus0

Softraid is nothing but a virtual HBA.  Or a shim or a
$insert_fancy_name_here.

> 
> The *consumer* of the resource is expecting to see a disk attached to
> a (fictitious) scsi/ata controller on it's local PCIe bus (which is
> imported via ExpEther).
> 
> The *provider* of the resource needs to take a softraid volume and make
> it look like just a (fictitious) disk attached to a (fictitious)
> scsi/ata controller on a (fictitious) PCIe bus (which is exported via
> ExpEther).

Sure all this is done in softraid today.  See the disabled AOE code as
an example.

> Whether or not the shimming is done below partitioning on the provider
> side is yet to be determined. If it is done above partitioning on the
> provider side (i.e. block devices like sd0a), the result will be two
> layers of partitioning (both provider and consumer) since sd0a on the
> provider-system would become the (fictitious) sd0 on the
> consuming-system.
> 
> The thing to remember is we're talking *below* the file system, so well
> intended suggestions of NFS, ZFS, or file-system-de-jour are not at all
> relevant.

Correct.

Reply via email to