Re: [RFC] Direct Sockets Support??
[EMAIL PROTECTED] said: > > But in the case of an application which fits in main memory, and > > has been running for a while (so all pages are present and > > dirty), all you'd really have to do is verify the page tables are > > in the proper state and skip the TLB flush, right? > > We really cannot assume this. There are two cases > a. when a user app wants to receive some data, it allocates > memory(using malloc) and waits for the hw to do zero-copy read. The kernel > does not allocate physical page frames for the entire memory region > allocated. We need to lock the memory (and locking is expensive due to > costly TLB flushes) to do this > > b. when a user app wants to send data, he fills the buffer > and waits for the hw to transmit data, but under heavy physical memory > pressure, the swapper might swap the pages we want to transmit. So we need > to lock the memory to be 100% sure. You're right, of course. But I suspect that the fast path of re-locking memory which is happily in core will go much faster by removing the multi-processor TLB purge. And it can't hurt, unless I'm missing something. -- Pete --- linux-2.4.4-stock/mm/mlock.cTue May 8 17:26:34 2001 +++ linux/mm/mlock.cTue May 8 17:24:13 2001 @@ -114,6 +114,10 @@ return 0; } +/* implemented in mm/memory.c */ +extern int mlock_make_pages_present(struct vm_area_struct *vma, + unsigned long addr, unsigned long end); + static int mlock_fixup(struct vm_area_struct * vma, unsigned long start, unsigned long end, unsigned int newflags) { @@ -138,7 +142,7 @@ pages = (end - start) >> PAGE_SHIFT; if (newflags & VM_LOCKED) { pages = -pages; - make_pages_present(start, end); + mlock_make_pages_present(vma, start, end); } vma->vm_mm->locked_vm -= pages; } --- linux-2.4.4-stock/mm/memory.c Tue May 8 17:25:36 2001 +++ linux/mm/memory.c Tue May 8 17:24:40 2001 @@ -1438,3 +1438,80 @@ } while (addr < end); return 0; } + +/* + * Specialized version of make_pages_present which does not require + * a multi-processor TLB purge for every page if nothing about the PTE + * was modified. + */ +int mlock_make_pages_present(struct vm_area_struct *vma, + unsigned long addr, unsigned long end) +{ + int ret, write; + struct mm_struct *mm = current->mm; + + write = (vma->vm_flags & VM_WRITE) != 0; + + /* +* We need the page table lock to synchronize with kswapd +* and the SMP-safe atomic PTE updates. +*/ + spin_lock(>page_table_lock); + + ret = 0; + for (ret=0; !ret && addr < end; addr += PAGE_SIZE) { + pgd_t *pgd; + pmd_t *pmd; + pte_t *pte, entry; + int modified; + + current->state = TASK_RUNNING; + pgd = pgd_offset(mm, addr); + pmd = pmd_alloc(mm, pgd, addr); + if (!pmd) { + ret = -1; + break; + } + pte = pte_alloc(mm, pmd, addr); + if (!pte) { + ret = -1; + break; + } + entry = *pte; + if (!pte_present(entry)) { + /* +* If it truly wasn't present, we know that kswapd +* and the PTE updates will not touch it later. So +* drop the lock. +*/ + if (pte_none(entry)) { + ret = do_no_page(mm, vma, addr, write, pte); + continue; + } + ret = do_swap_page(mm, vma, addr, pte, + pte_to_swp_entry(entry), write); + continue; + } + + modified = 0; + if (write) { + if (!pte_write(entry)) { + ret = do_wp_page(mm, vma, addr, pte, entry); + continue; + } + if (!pte_dirty(entry)) { + entry = pte_mkdirty(entry); + modified = 1; + } + } + if (!pte_young(entry)) { + entry = pte_mkyoung(entry); + modified = 1; + } + if (modified) + establish_pte(vma, addr, pte, entry); + } + + spin_unlock(>page_table_lock); + return ret; +} - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at
Re: [RFC] Direct Sockets Support??
> a. when a user app wants to receive some data, it allocates > memory(using malloc) and waits for the hw to do zero-copy read. The kernel > does not allocate physical page frames for the entire memory region > allocated. We need to lock the memory (and locking is expensive due to > costly TLB flushes) to do this > > b. when a user app wants to send data, he fills the buffer > and waits for the hw to transmit data, but under heavy physical memory > pressure, the swapper might swap the pages we want to transmit. So we need > to lock the memory to be 100% sure. > Or c) you prealloc two ring buffers. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC] Direct Sockets Support??
> But in the case of an application which fits in main memory, and > has been running for a while (so all pages are present and > dirty), all you'd really have to do is verify the page tables are > in the proper state and skip the TLB flush, right? We really cannot assume this. There are two cases a. when a user app wants to receive some data, it allocates memory(using malloc) and waits for the hw to do zero-copy read. The kernel does not allocate physical page frames for the entire memory region allocated. We need to lock the memory (and locking is expensive due to costly TLB flushes) to do this b. when a user app wants to send data, he fills the buffer and waits for the hw to transmit data, but under heavy physical memory pressure, the swapper might swap the pages we want to transmit. So we need to lock the memory to be 100% sure. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Direct Sockets Support??
[EMAIL PROTECTED] said: > > A couple of concerns I have: > > * How to pin or pagelock the application buffer without > > making a kernel transition. > > You need to pin them in advance. And pinning pages is _expensive_ so you dont > want to keep pinning/unpinning pages I can't convince myself why this has to be so expensive. The current implementation does this for mlock: 1. Split vma if only a subset of the pages are being locked. 2. Mark bit in vma. 3. Make sure the pages are in core. That third step has the potential of being the most expensive, as changing the page tables requires invalidating the TLBs on all processors. Currently make_pages_present() does the work for 3. But in the case of an application which fits in main memory, and has been running for a while (so all pages are present and dirty), all you'd really have to do is verify the page tables are in the proper state and skip the TLB flush, right? Then 3 turns into a single spin_lock pair for the page_table_lock, and walking down the page table. The VMA splitting can be nasty, as it might require a couple of slab allocations, and doing an AVL insertion. (More nastiness in the case of shared memory or file mapping, too.) But nothing like playing with TLBs. Any reason why make_pages_present() is not the really oversized hammer it seems to be? -- Pete - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Direct Sockets Support??
> A couple of concerns I have: > * How to pin or pagelock the application buffer without > making a kernel transition. You need to pin them in advance. And pinning pages is _expensive_ so you dont want to keep pinning/unpinning pages > * Assuming the memory can be locked down, how can a list > of physical memory ranges be obtained (necessary to support > scatter/gather DMA? Is kiobufs suitable with it's page-alignment > constraints? If kiobufs will work, how can the kernel transition be > avoided? kiovecs will do that. It might be a little heavyweight but that should improve in 2.5 as we move to a slightly lighter model > WinSock Direct seems to address these concerns. These issues > become important at 1Gb and 10Gb speeds. 1Gbit - not really, 10Gbit yes - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Direct Sockets Support??
A couple of concerns I have: * How to pin or pagelock the application buffer without making a kernel transition. You need to pin them in advance. And pinning pages is _expensive_ so you dont want to keep pinning/unpinning pages * Assuming the memory can be locked down, how can a list of physical memory ranges be obtained (necessary to support scatter/gather DMA? Is kiobufs suitable with it's page-alignment constraints? If kiobufs will work, how can the kernel transition be avoided? kiovecs will do that. It might be a little heavyweight but that should improve in 2.5 as we move to a slightly lighter model WinSock Direct seems to address these concerns. These issues become important at 1Gb and 10Gb speeds. 1Gbit - not really, 10Gbit yes - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Direct Sockets Support??
[EMAIL PROTECTED] said: A couple of concerns I have: * How to pin or pagelock the application buffer without making a kernel transition. You need to pin them in advance. And pinning pages is _expensive_ so you dont want to keep pinning/unpinning pages I can't convince myself why this has to be so expensive. The current implementation does this for mlock: 1. Split vma if only a subset of the pages are being locked. 2. Mark bit in vma. 3. Make sure the pages are in core. That third step has the potential of being the most expensive, as changing the page tables requires invalidating the TLBs on all processors. Currently make_pages_present() does the work for 3. But in the case of an application which fits in main memory, and has been running for a while (so all pages are present and dirty), all you'd really have to do is verify the page tables are in the proper state and skip the TLB flush, right? Then 3 turns into a single spin_lock pair for the page_table_lock, and walking down the page table. The VMA splitting can be nasty, as it might require a couple of slab allocations, and doing an AVL insertion. (More nastiness in the case of shared memory or file mapping, too.) But nothing like playing with TLBs. Any reason why make_pages_present() is not the really oversized hammer it seems to be? -- Pete - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC] Direct Sockets Support??
But in the case of an application which fits in main memory, and has been running for a while (so all pages are present and dirty), all you'd really have to do is verify the page tables are in the proper state and skip the TLB flush, right? We really cannot assume this. There are two cases a. when a user app wants to receive some data, it allocates memory(using malloc) and waits for the hw to do zero-copy read. The kernel does not allocate physical page frames for the entire memory region allocated. We need to lock the memory (and locking is expensive due to costly TLB flushes) to do this b. when a user app wants to send data, he fills the buffer and waits for the hw to transmit data, but under heavy physical memory pressure, the swapper might swap the pages we want to transmit. So we need to lock the memory to be 100% sure. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Direct Sockets Support??
a. when a user app wants to receive some data, it allocates memory(using malloc) and waits for the hw to do zero-copy read. The kernel does not allocate physical page frames for the entire memory region allocated. We need to lock the memory (and locking is expensive due to costly TLB flushes) to do this b. when a user app wants to send data, he fills the buffer and waits for the hw to transmit data, but under heavy physical memory pressure, the swapper might swap the pages we want to transmit. So we need to lock the memory to be 100% sure. Or c) you prealloc two ring buffers. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Direct Sockets Support??
[EMAIL PROTECTED] said: But in the case of an application which fits in main memory, and has been running for a while (so all pages are present and dirty), all you'd really have to do is verify the page tables are in the proper state and skip the TLB flush, right? We really cannot assume this. There are two cases a. when a user app wants to receive some data, it allocates memory(using malloc) and waits for the hw to do zero-copy read. The kernel does not allocate physical page frames for the entire memory region allocated. We need to lock the memory (and locking is expensive due to costly TLB flushes) to do this b. when a user app wants to send data, he fills the buffer and waits for the hw to transmit data, but under heavy physical memory pressure, the swapper might swap the pages we want to transmit. So we need to lock the memory to be 100% sure. You're right, of course. But I suspect that the fast path of re-locking memory which is happily in core will go much faster by removing the multi-processor TLB purge. And it can't hurt, unless I'm missing something. -- Pete --- linux-2.4.4-stock/mm/mlock.cTue May 8 17:26:34 2001 +++ linux/mm/mlock.cTue May 8 17:24:13 2001 @@ -114,6 +114,10 @@ return 0; } +/* implemented in mm/memory.c */ +extern int mlock_make_pages_present(struct vm_area_struct *vma, + unsigned long addr, unsigned long end); + static int mlock_fixup(struct vm_area_struct * vma, unsigned long start, unsigned long end, unsigned int newflags) { @@ -138,7 +142,7 @@ pages = (end - start) PAGE_SHIFT; if (newflags VM_LOCKED) { pages = -pages; - make_pages_present(start, end); + mlock_make_pages_present(vma, start, end); } vma-vm_mm-locked_vm -= pages; } --- linux-2.4.4-stock/mm/memory.c Tue May 8 17:25:36 2001 +++ linux/mm/memory.c Tue May 8 17:24:40 2001 @@ -1438,3 +1438,80 @@ } while (addr end); return 0; } + +/* + * Specialized version of make_pages_present which does not require + * a multi-processor TLB purge for every page if nothing about the PTE + * was modified. + */ +int mlock_make_pages_present(struct vm_area_struct *vma, + unsigned long addr, unsigned long end) +{ + int ret, write; + struct mm_struct *mm = current-mm; + + write = (vma-vm_flags VM_WRITE) != 0; + + /* +* We need the page table lock to synchronize with kswapd +* and the SMP-safe atomic PTE updates. +*/ + spin_lock(mm-page_table_lock); + + ret = 0; + for (ret=0; !ret addr end; addr += PAGE_SIZE) { + pgd_t *pgd; + pmd_t *pmd; + pte_t *pte, entry; + int modified; + + current-state = TASK_RUNNING; + pgd = pgd_offset(mm, addr); + pmd = pmd_alloc(mm, pgd, addr); + if (!pmd) { + ret = -1; + break; + } + pte = pte_alloc(mm, pmd, addr); + if (!pte) { + ret = -1; + break; + } + entry = *pte; + if (!pte_present(entry)) { + /* +* If it truly wasn't present, we know that kswapd +* and the PTE updates will not touch it later. So +* drop the lock. +*/ + if (pte_none(entry)) { + ret = do_no_page(mm, vma, addr, write, pte); + continue; + } + ret = do_swap_page(mm, vma, addr, pte, + pte_to_swp_entry(entry), write); + continue; + } + + modified = 0; + if (write) { + if (!pte_write(entry)) { + ret = do_wp_page(mm, vma, addr, pte, entry); + continue; + } + if (!pte_dirty(entry)) { + entry = pte_mkdirty(entry); + modified = 1; + } + } + if (!pte_young(entry)) { + entry = pte_mkyoung(entry); + modified = 1; + } + if (modified) + establish_pte(vma, addr, pte, entry); + } + + spin_unlock(mm-page_table_lock); + return ret; +} - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ
Re: [RFC] Direct Sockets Support??
> Thats exactly my point, we need to define a new protocol family to > support it. This means that all applications using PF_INET needs to be > changed and recompiled. My basic argument goes like this if hardware can Thanks to the magic of shared libraries and LD_PRELOAD a library hook can actually make the decision underneath the application - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC] Direct Sockets Support??
>> technology is Infiniband . In Infiniband, the hardware supports IPv6 . For >> this type of devices there is no need for software TCP/IP. But for >> networking application, which mostly uses sockets, there is a performance >> penalty with using software TCP/IP over this hardware. > IPv6 is only the bottom layer of the stack. TCP does a lot lot more. Sorry to have confused you. IB supports the notion of connection over IPv6, not exactly TCP. I just interchanged TCP and notion of connection provided by infiniband. Infiniband is a cluster of technologies like VI, IP, etc. So i felt that we can take advantage of this to do networking. Because the speed of IB ranges from 2.5Gbps to 30Gbps, even a slight overhead in software will affect performance very badly. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC] Direct Sockets Support??
> For the case where the routing will be external. Thats conveniently > something > you can deduce in advance. In theory nothing stops you implementing this. > Conventionally you would do that with BSD sockets by implementing a new > socket family PF_INFINIBAND. You might then choose to make the selection > of that either done by the application or under it by C library overrides. > Thats exactly my point, we need to define a new protocol family to support it. This means that all applications using PF_INET needs to be changed and recompiled. My basic argument goes like this if hardware can support the notion of connection, the sockets layer should be aware of this and send all request to the hw. I can assign an IPv4 address(for sake of backward compatiblity) and get away w/o software TCP/IP.i get the performance benefit of hardware TCP/IP (notion of connection). The windoze 2000 DDK has an interesting section about WinSock direct(r) that lets the SAN hardware (like IB) to still use traditional PF_INET for it. Also one interesting whitepaper http://servernet.himalaya.compaq.com/snet2/whitepapers/WSD_Perf_White_Paper_ 3-21-01.doc - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Direct Sockets Support??
> different topology subnets. Fabrics like Infiniband provide security on > hardware, so there is no need to worry about it. The simple point is that > hw supports TCP/IP, then why do we need a software TCP/IP over it? For the case where the routing will be external. Thats conveniently something you can deduce in advance. In theory nothing stops you implementing this. Conventionally you would do that with BSD sockets by implementing a new socket family PF_INFINIBAND. You might then choose to make the selection of that either done by the application or under it by C library overrides. A network protocol stack is also not required to use sk_buffs, or to use conventional dev_queue_foo() models so you can write a fairly thin layer. What I am not sure about would be the best way to implement read/write operations if the hardware can support these without kernel calls - ie via mmap and secure page access. That bit is an interesting problem Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC] Direct Sockets Support??
- Received message begins Here - > > > > Doesn't this bypass all of the network security controls? Granted > - it is > > completely reasonable in a dedicated environment, but I would > think the > > security loss would prevent it from being used for most usage. > > Direct Sockets makes sense only in clustering (server farms) to > reduce intra-farm communication. It is *not* supposed to be used for regular > internet. Direct Sockets over subnets is also tough to implement it across > different topology subnets. Fabrics like Infiniband provide security on > hardware, so there is no need to worry about it. The simple point is that > hw supports TCP/IP, then why do we need a software TCP/IP over it? Because the hardware doesn't have the users security context. All it can see are addresses, socket numbers and protocol. Neither can it be extended with that information (IPSec). Authentication of the connections are not possible. Now... If the server farm only runs one job at a time, it is irrelevent... - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Direct Sockets Support??
> technology is Infiniband . In Infiniband, the hardware supports IPv6 . For > this type of devices there is no need for software TCP/IP. But for > networking application, which mostly uses sockets, there is a performance > penalty with using software TCP/IP over this hardware. IPv6 is only the bottom layer of the stack. TCP does a lot lot more. > > access setup is actually needed. > > > My point is that if the hardware is capable of doing TCP/IP , we > should let the sockets layer talk directly to it (direct sockets). Thereby > the application which uses the sockets will get better performance. That depends on where your overheads are. Remember that for every direct access you make you trade off kernel syscall overhead against userspace scheduling and locking overhead. The VI architecture tries to design well to handle this I've not seen enough about infiniband to know that. The 'better performance' is an assumption that isnt always as simple as it seems - especially with high mtu values and real world applications - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC] Direct Sockets Support??
> Doesn't this bypass all of the network security controls? Granted - it is > completely reasonable in a dedicated environment, but I would think the > security loss would prevent it from being used for most usage. Direct Sockets makes sense only in clustering (server farms) to reduce intra-farm communication. It is *not* supposed to be used for regular internet. Direct Sockets over subnets is also tough to implement it across different topology subnets. Fabrics like Infiniband provide security on hardware, so there is no need to worry about it. The simple point is that hw supports TCP/IP, then why do we need a software TCP/IP over it? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC] Direct Sockets Support??
> > Define 'direct sockets' firstly. > Direct Sockets is the ablity by which the application(using sockets) > can use the hardwares features to provide connection, flow control, > etc.,instead of the TCP and IP software module. A typical hardware > technology is Infiniband . In Infiniband, the hardware supports IPv6 . For > this type of devices there is no need for software TCP/IP. But for > networking application, which mostly uses sockets, there is a performance > penalty with using software TCP/IP over this hardware. > > > I have seen several lines of attack on very high bandwidth devices. > > Firstly > > the linux projects a while ago doing usermode message passing directly > > over > > network cards for ultra low latency. Secondly there was a VI based project > > that was mostly driven from userspace. > > > The application needs to rewritten to use VIPL, but if we could > provide a sockets over VI (or Sockets over IB), then the existing > applications can run with a known environment. > > > > One thing that remains unresolved is the question as to whether the very > > low > > cost Linux syscalls and zero copy are enough to achieve this using a > > conventional socket API and the kernel space, or whether a hybrid direct > > access setup is actually needed. > > > My point is that if the hardware is capable of doing TCP/IP , we > should let the sockets layer talk directly to it (direct sockets). Thereby > the application which uses the sockets will get better performance. Doesn't this bypass all of the network security controls? Granted - it is completely reasonable in a dedicated environment, but I would think the security loss would prevent it from being used for most usage. - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC] Direct Sockets Support??
> Define 'direct sockets' firstly. Direct Sockets is the ablity by which the application(using sockets) can use the hardwares features to provide connection, flow control, etc.,instead of the TCP and IP software module. A typical hardware technology is Infiniband . In Infiniband, the hardware supports IPv6 . For this type of devices there is no need for software TCP/IP. But for networking application, which mostly uses sockets, there is a performance penalty with using software TCP/IP over this hardware. > I have seen several lines of attack on very high bandwidth devices. > Firstly > the linux projects a while ago doing usermode message passing directly > over > network cards for ultra low latency. Secondly there was a VI based project > that was mostly driven from userspace. > The application needs to rewritten to use VIPL, but if we could provide a sockets over VI (or Sockets over IB), then the existing applications can run with a known environment. > One thing that remains unresolved is the question as to whether the very > low > cost Linux syscalls and zero copy are enough to achieve this using a > conventional socket API and the kernel space, or whether a hybrid direct > access setup is actually needed. > My point is that if the hardware is capable of doing TCP/IP , we should let the sockets layer talk directly to it (direct sockets). Thereby the application which uses the sockets will get better performance. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Direct Sockets Support??
> With the advent of VI and Infiniband, there is a growing need to support = > Sockets over such new technologies. I studied recent performance = > analysis of sockets vs direct sockets and found that there is a 250% = > performance hike and 30% decrease in latency time. Also CPU bandwidth is = > significantly reduced.=20 Define 'direct sockets' firstly. I have seen several lines of attack on very high bandwidth devices. Firstly the linux projects a while ago doing usermode message passing directly over network cards for ultra low latency. Secondly there was a VI based project that was mostly driven from userspace. One thing that remains unresolved is the question as to whether the very low cost Linux syscalls and zero copy are enough to achieve this using a conventional socket API and the kernel space, or whether a hybrid direct access setup is actually needed. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Direct Sockets Support??
With the advent of VI and Infiniband, there is a growing need to support = Sockets over such new technologies. I studied recent performance = analysis of sockets vs direct sockets and found that there is a 250% = performance hike and 30% decrease in latency time. Also CPU bandwidth is = significantly reduced.=20 Define 'direct sockets' firstly. I have seen several lines of attack on very high bandwidth devices. Firstly the linux projects a while ago doing usermode message passing directly over network cards for ultra low latency. Secondly there was a VI based project that was mostly driven from userspace. One thing that remains unresolved is the question as to whether the very low cost Linux syscalls and zero copy are enough to achieve this using a conventional socket API and the kernel space, or whether a hybrid direct access setup is actually needed. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC] Direct Sockets Support??
Define 'direct sockets' firstly. Direct Sockets is the ablity by which the application(using sockets) can use the hardwares features to provide connection, flow control, etc.,instead of the TCP and IP software module. A typical hardware technology is Infiniband . In Infiniband, the hardware supports IPv6 . For this type of devices there is no need for software TCP/IP. But for networking application, which mostly uses sockets, there is a performance penalty with using software TCP/IP over this hardware. I have seen several lines of attack on very high bandwidth devices. Firstly the linux projects a while ago doing usermode message passing directly over network cards for ultra low latency. Secondly there was a VI based project that was mostly driven from userspace. The application needs to rewritten to use VIPL, but if we could provide a sockets over VI (or Sockets over IB), then the existing applications can run with a known environment. One thing that remains unresolved is the question as to whether the very low cost Linux syscalls and zero copy are enough to achieve this using a conventional socket API and the kernel space, or whether a hybrid direct access setup is actually needed. My point is that if the hardware is capable of doing TCP/IP , we should let the sockets layer talk directly to it (direct sockets). Thereby the application which uses the sockets will get better performance. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC] Direct Sockets Support??
Define 'direct sockets' firstly. Direct Sockets is the ablity by which the application(using sockets) can use the hardwares features to provide connection, flow control, etc.,instead of the TCP and IP software module. A typical hardware technology is Infiniband . In Infiniband, the hardware supports IPv6 . For this type of devices there is no need for software TCP/IP. But for networking application, which mostly uses sockets, there is a performance penalty with using software TCP/IP over this hardware. I have seen several lines of attack on very high bandwidth devices. Firstly the linux projects a while ago doing usermode message passing directly over network cards for ultra low latency. Secondly there was a VI based project that was mostly driven from userspace. The application needs to rewritten to use VIPL, but if we could provide a sockets over VI (or Sockets over IB), then the existing applications can run with a known environment. One thing that remains unresolved is the question as to whether the very low cost Linux syscalls and zero copy are enough to achieve this using a conventional socket API and the kernel space, or whether a hybrid direct access setup is actually needed. My point is that if the hardware is capable of doing TCP/IP , we should let the sockets layer talk directly to it (direct sockets). Thereby the application which uses the sockets will get better performance. Doesn't this bypass all of the network security controls? Granted - it is completely reasonable in a dedicated environment, but I would think the security loss would prevent it from being used for most usage. - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC] Direct Sockets Support??
Doesn't this bypass all of the network security controls? Granted - it is completely reasonable in a dedicated environment, but I would think the security loss would prevent it from being used for most usage. Direct Sockets makes sense only in clustering (server farms) to reduce intra-farm communication. It is *not* supposed to be used for regular internet. Direct Sockets over subnets is also tough to implement it across different topology subnets. Fabrics like Infiniband provide security on hardware, so there is no need to worry about it. The simple point is that hw supports TCP/IP, then why do we need a software TCP/IP over it? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Direct Sockets Support??
technology is Infiniband . In Infiniband, the hardware supports IPv6 . For this type of devices there is no need for software TCP/IP. But for networking application, which mostly uses sockets, there is a performance penalty with using software TCP/IP over this hardware. IPv6 is only the bottom layer of the stack. TCP does a lot lot more. access setup is actually needed. My point is that if the hardware is capable of doing TCP/IP , we should let the sockets layer talk directly to it (direct sockets). Thereby the application which uses the sockets will get better performance. That depends on where your overheads are. Remember that for every direct access you make you trade off kernel syscall overhead against userspace scheduling and locking overhead. The VI architecture tries to design well to handle this I've not seen enough about infiniband to know that. The 'better performance' is an assumption that isnt always as simple as it seems - especially with high mtu values and real world applications - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC] Direct Sockets Support??
- Received message begins Here - Doesn't this bypass all of the network security controls? Granted - it is completely reasonable in a dedicated environment, but I would think the security loss would prevent it from being used for most usage. Direct Sockets makes sense only in clustering (server farms) to reduce intra-farm communication. It is *not* supposed to be used for regular internet. Direct Sockets over subnets is also tough to implement it across different topology subnets. Fabrics like Infiniband provide security on hardware, so there is no need to worry about it. The simple point is that hw supports TCP/IP, then why do we need a software TCP/IP over it? Because the hardware doesn't have the users security context. All it can see are addresses, socket numbers and protocol. Neither can it be extended with that information (IPSec). Authentication of the connections are not possible. Now... If the server farm only runs one job at a time, it is irrelevent... - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Direct Sockets Support??
different topology subnets. Fabrics like Infiniband provide security on hardware, so there is no need to worry about it. The simple point is that hw supports TCP/IP, then why do we need a software TCP/IP over it? For the case where the routing will be external. Thats conveniently something you can deduce in advance. In theory nothing stops you implementing this. Conventionally you would do that with BSD sockets by implementing a new socket family PF_INFINIBAND. You might then choose to make the selection of that either done by the application or under it by C library overrides. A network protocol stack is also not required to use sk_buffs, or to use conventional dev_queue_foo() models so you can write a fairly thin layer. What I am not sure about would be the best way to implement read/write operations if the hardware can support these without kernel calls - ie via mmap and secure page access. That bit is an interesting problem Alan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC] Direct Sockets Support??
For the case where the routing will be external. Thats conveniently something you can deduce in advance. In theory nothing stops you implementing this. Conventionally you would do that with BSD sockets by implementing a new socket family PF_INFINIBAND. You might then choose to make the selection of that either done by the application or under it by C library overrides. Thats exactly my point, we need to define a new protocol family to support it. This means that all applications using PF_INET needs to be changed and recompiled. My basic argument goes like this if hardware can support the notion of connection, the sockets layer should be aware of this and send all request to the hw. I can assign an IPv4 address(for sake of backward compatiblity) and get away w/o software TCP/IP.i get the performance benefit of hardware TCP/IP (notion of connection). The windoze 2000 DDK has an interesting section about WinSock direct(r) that lets the SAN hardware (like IB) to still use traditional PF_INET for it. Also one interesting whitepaper http://servernet.himalaya.compaq.com/snet2/whitepapers/WSD_Perf_White_Paper_ 3-21-01.doc - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC] Direct Sockets Support??
technology is Infiniband . In Infiniband, the hardware supports IPv6 . For this type of devices there is no need for software TCP/IP. But for networking application, which mostly uses sockets, there is a performance penalty with using software TCP/IP over this hardware. IPv6 is only the bottom layer of the stack. TCP does a lot lot more. Sorry to have confused you. IB supports the notion of connection over IPv6, not exactly TCP. I just interchanged TCP and notion of connection provided by infiniband. Infiniband is a cluster of technologies like VI, IP, etc. So i felt that we can take advantage of this to do networking. Because the speed of IB ranges from 2.5Gbps to 30Gbps, even a slight overhead in software will affect performance very badly. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Direct Sockets Support??
Thats exactly my point, we need to define a new protocol family to support it. This means that all applications using PF_INET needs to be changed and recompiled. My basic argument goes like this if hardware can Thanks to the magic of shared libraries and LD_PRELOAD a library hook can actually make the decision underneath the application - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/