Re: NUMA and SCI [was Re: bigphysarea support in 2.2.19 and 2.4.0 kernels]

2000-12-22 Thread Jeff V. Merkey

On Fri, Dec 22, 2000 at 11:37:29AM -0800, Tim Wright wrote:

I have been working with SCI since 1994.  The people who own 
Dolphin and the SCI chipsets also own TRG.  We dropped work in 
the P6 ccNUMA cards several years back because Intel was 
convinced that shared-nothing was the way to go (and it is).
However, SCI's ability to create explicit sharing makes it
the fastest shared nothing interface around for message passing
(go figure).

I think we do need some bettr APIs.  Grab the source at my FTP server,
and I'd love any input you could provide.

Thanks,

:-)

Jeff


> Hi Jeff,
> 
> On Fri, Dec 22, 2000 at 11:11:05AM -0700, Jeff V. Merkey wrote:
> [...]
> > SCI allows machines to create windows of shared memory across a cluster
> > of nodes, and at 1 Gigabyte-per-second (Gigabyte not gigabit).  I am
> > putting a sockets interface into the drivers so Apache, LVS, and 
> > Pirahna can use these very high speed adapters for a clustered web 
> > server.  Our M2FS clustered file system also is being architected 
> > to use these cards.  
> 
> You're probably aware of this, but SCI allows a lot more then the creation
> of windows of shared memory. The IBM NUMA-Q machines (what was Sequent), use
> the SCI interconnect to build a single-system image machine with all memory
> visible from all "nodes". In fact, all the commercial NUMA machines of which
> I am aware have this property (all nodes see and can address all memory). The
> non-uniform part of NUMA comes from the potentially differing latency and
> speed of different parts of memory (local vs remote in this case).
> AFAIK, the work that Kanoj Sarcar has been doing is to enable such machines.
> 
> It sounds like you have a different requirement of very high-speed shared
> memory between different nodes that can be mapped and unmapped as required.
> Do I understand this correctly ? That would make your requirements somewhat
> orthogonal to the requirements those of us with NUMA architectures have.
> 
> > I will post the source code for the SCI cards at vger.timpanogas.org 
> > and if you have time, please download this code and take a look at 
> > how we are using the bigphysarea APIs to create these windows accros
> > machines.  The current NUMA support in Linux is somewhat slim, and 
> > I would like to use established APIs to do this if possible.
> 
> See above. It may be that you need different APIs anyway.
> 
> Regards,
> 
> Tim
> 
> -- 
> Tim Wright - [EMAIL PROTECTED] or [EMAIL PROTECTED] or [EMAIL PROTECTED]
> IBM Linux Technology Center, Beaverton, Oregon
> "Nobody ever said I was charming, they said "Rimmer, you're a git!"" RD VI
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



NUMA and SCI [was Re: bigphysarea support in 2.2.19 and 2.4.0 kernels]

2000-12-22 Thread Tim Wright

Hi Jeff,

On Fri, Dec 22, 2000 at 11:11:05AM -0700, Jeff V. Merkey wrote:
[...]
> SCI allows machines to create windows of shared memory across a cluster
> of nodes, and at 1 Gigabyte-per-second (Gigabyte not gigabit).  I am
> putting a sockets interface into the drivers so Apache, LVS, and 
> Pirahna can use these very high speed adapters for a clustered web 
> server.  Our M2FS clustered file system also is being architected 
> to use these cards.  

You're probably aware of this, but SCI allows a lot more then the creation
of windows of shared memory. The IBM NUMA-Q machines (what was Sequent), use
the SCI interconnect to build a single-system image machine with all memory
visible from all "nodes". In fact, all the commercial NUMA machines of which
I am aware have this property (all nodes see and can address all memory). The
non-uniform part of NUMA comes from the potentially differing latency and
speed of different parts of memory (local vs remote in this case).
AFAIK, the work that Kanoj Sarcar has been doing is to enable such machines.

It sounds like you have a different requirement of very high-speed shared
memory between different nodes that can be mapped and unmapped as required.
Do I understand this correctly ? That would make your requirements somewhat
orthogonal to the requirements those of us with NUMA architectures have.

> I will post the source code for the SCI cards at vger.timpanogas.org 
> and if you have time, please download this code and take a look at 
> how we are using the bigphysarea APIs to create these windows accros
> machines.  The current NUMA support in Linux is somewhat slim, and 
> I would like to use established APIs to do this if possible.

See above. It may be that you need different APIs anyway.

Regards,

Tim

-- 
Tim Wright - [EMAIL PROTECTED] or [EMAIL PROTECTED] or [EMAIL PROTECTED]
IBM Linux Technology Center, Beaverton, Oregon
"Nobody ever said I was charming, they said "Rimmer, you're a git!"" RD VI
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/