subject:"32\-bit memory limits IN DETAIL \(Was\: perspectives on 32 bit vs 64 bit\)"

Re: 32-bit memory limits IN DETAIL (Was: perspectives on 32 bit vs 64 bit)

2005-11-03 Thread Helge Hafting


Martin Kuball wrote:


Am Tuesday, 25. October 2005 02:31 schrieb [EMAIL PROTECTED]:
[snip]
 


Because the kernel address space has to hold more than just RAM (in
particular, it also has to hold memory-mapped PCI devices like
video cards), if you have 1G of physical memory, the kernel will by
default only use 896M of it, leaving 128M of kernel address space
for PCI devices.

A different user/kernel split can help there.  I use 2.75/1.25G on
1G RAM machines, but if you use PAE or NX, the split has to be on a
1G boundary.


But these are all workarounds.  The real solution is to use a
larger virtual address space so that the original, efficient
technique of mapping both the user's virtual address space and the
kernel's address space (basically a copy of physical memory) will
both fit.
   



And what about 64bit systems? How is the splitting done there? Do I 
have to worry?
 


The problem is exactly the same, but on a larger scale.
For 32-bit processors, you get trouble when your programs
need near 2^32 bytes or more. (I.e. 4GB.)  For a true 64-bit
processor, you get the same troubles the day you need
near 2^64 bytes or more per process.  Nobody is anywhere
near this limit yet. 


Helge Hafting


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Re: 32-bit memory limits IN DETAIL (Was: perspectives on 32 bit vs 64 bit)

2005-11-03 Thread Sylvain Sauvage

Jeudi 3 novembre 2005, 15:42:30 CET, Helge Hafting a écrit :
[...] 
  Nobody is anywhere near this limit yet. 
 
 Helge Hafting

Sure, 640kB^W 16EB ought to be enough. ;o)

-- 
Sylvain Sauvage

Re: 32-bit memory limits IN DETAIL (Was: perspectives on 32 bit vs 64 bit)

2005-10-25 Thread studio-64


Hi
I'm now worried as I nearly understand this!!
I need to play more guitar!!

Many thanks for a well written mail.

cheers
Bob


[EMAIL PROTECTED] wrote:

This seems to come up every now and then, so let me explain.
None of this is new information, butt can be a bit confusing. 


First, i386 memory addressing.

The i386 is unlike all other processors in that there are two levels
of address translation that take place.

First, we have a 16-bit segment + 32-bit offset VIRTUAL address.
Now, 3 bits of that segment are sort of taken (2 bits of RPL and
1 local/global bit), so you really only get 8192 segments per process.

This VIRTUAL address is then translated into a 32-bit LINEAR address by
checking the offset against the segment limit and adding the segment base.

Then this 32-bit LINEAR address is fed to a standard page-based MMU,
producing a 32- or 36-bit PHYSICAL address.

Most processors go VIRTUAL --page tables-- PHYSICAL.
i386 goes VIRTUAL --segments-- LINEAR --page tables-- PHYSICAL.


The bottleneck is the 32-bit LINEAR address space.  A process can have
at most 2^32 bytes addressible at any one time without the operating system
rewriting the page tables.

Note first of all that, if you actively use more than one segment at a
time (such as for code, stack and data), this limits your maximum segment
size to less than 2^32 bytes each, since the TOTAL of the sumultaneously
accessible segments has to fit within 2^32 bytes.  So, for example,
if you had two segments of 4G, you could not have them both resident at
the same time, and so you could not get a MOV instruction from one to
the other to complete.  (And the MOV instruction itself would have
to go somewhere.)

Thus, you can not actually reach the 2^45-byte addressing limit that
up to 2^13 segments of up to 2^32 bytes each implies.


Secondly, even if you do demand segmentation, bringing segments into
and out of the 32-bit LINEAR address space, this still requires that
the operating system rewrite the page tables (and invalidate the TLB entried)
in response to segment faults in order to access the relevant bits of 
PHYSICAL memory.


This is exactly the SAME operating system and hardware overhead as
using mmap or mremap to remap bits of a linear address space.  The only
difference would be if it were much easier for the user program to deal
with segments than to deal with explicit dynamic mmaps.  And it's not
at all clear that it is.


For these reasons, 32-bit x86 operating systems tend to ignore the
segmentation features and just use paging.  It just isn't worth the
complexity, and for multi-platform operating systems like Linux, it
isn't worth the portability hassles.  In fact, this has in turn led to
x86 designers de-emphasizing segment register loading speed, so large
model programs that use multiple segments take a significant speed hit.


Now, for why the Linux kernel takes 1 GB of virtual address space...

Every time a user-space program does a read() or write() call, or
makes any similar system call that moves a buffer of data, the kernel
has to copy between the user buffers and its own private file cache.

For this to be possible, the two source and destination buffers must
be in the same VIRTUAL address space.  And for it to be remotely efficient,
they have to be in the same LINEAR address space as well.

Now, it is possible to have a separate kernel address space, and demand-map
user-space buffers into it to do the copying.  That's what the 4G+4G patches
do.  But that means that on EVERY system call, you have to change the
page tables around, which results in flushing the TLB and a lot of
overhead.

The default Linux config arranges for the kernel's address space and the
user's address space to both be present at the same time.  Page table
entries have a permission bit that lets them be inaccessible to user
mode but accessed from kernel mode without having to reload the TLB.
This is very fast.  But it results in the classic split between 3G of
user address space and 1G of kernel address space.

It could be done different ways, but *any alternative would be much slower*
for typical programs that don't need more than 3G of address space.


The things that's causing a real problem is that common physical
memory sizes are approaching the 4G address space.  Thus, it's no
longer guaranteed that the 1G of kernel space is big enough to hold
all of physical memory, so kernel access to some parts of it has to be
bank-switched (the CONFIG_HIGHMEM options).  By careful design, this
has been kept reasonably fast, but there is overhead.

Because the kernel address space has to hold more than just RAM (in
particular, it also has to hold memory-mapped PCI devices like video
cards), if you have 1G of physical memory, the kernel will by default
only use 896M of it, leaving 128M of kernel address space for PCI devices.

A different user/kernel split can help there.  I use 2.75/1.25G on 1G RAM
machines, but if you use PAE or NX, the split has to be on a 1G

Re: 32-bit memory limits IN DETAIL (Was: perspectives on 32 bit vs 64 bit)

2005-10-25 Thread Martin Kuball

Am Tuesday, 25. October 2005 02:31 schrieb [EMAIL PROTECTED]:
[snip]
 Because the kernel address space has to hold more than just RAM (in
 particular, it also has to hold memory-mapped PCI devices like
 video cards), if you have 1G of physical memory, the kernel will by
 default only use 896M of it, leaving 128M of kernel address space
 for PCI devices.

 A different user/kernel split can help there.  I use 2.75/1.25G on
 1G RAM machines, but if you use PAE or NX, the split has to be on a
 1G boundary.


 But these are all workarounds.  The real solution is to use a
 larger virtual address space so that the original, efficient
 technique of mapping both the user's virtual address space and the
 kernel's address space (basically a copy of physical memory) will
 both fit.

And what about 64bit systems? How is the splitting done there? Do I 
have to worry?

Martin


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

32-bit memory limits IN DETAIL (Was: perspectives on 32 bit vs 64 bit)

2005-10-24 Thread linux

This seems to come up every now and then, so let me explain.
None of this is new information, butt can be a bit confusing. 

First, i386 memory addressing.

The i386 is unlike all other processors in that there are two levels
of address translation that take place.

First, we have a 16-bit segment + 32-bit offset VIRTUAL address.
Now, 3 bits of that segment are sort of taken (2 bits of RPL and
1 local/global bit), so you really only get 8192 segments per process.

This VIRTUAL address is then translated into a 32-bit LINEAR address by
checking the offset against the segment limit and adding the segment base.

Then this 32-bit LINEAR address is fed to a standard page-based MMU,
producing a 32- or 36-bit PHYSICAL address.

Most processors go VIRTUAL --page tables-- PHYSICAL.
i386 goes VIRTUAL --segments-- LINEAR --page tables-- PHYSICAL.


The bottleneck is the 32-bit LINEAR address space.  A process can have
at most 2^32 bytes addressible at any one time without the operating system
rewriting the page tables.

Note first of all that, if you actively use more than one segment at a
time (such as for code, stack and data), this limits your maximum segment
size to less than 2^32 bytes each, since the TOTAL of the sumultaneously
accessible segments has to fit within 2^32 bytes.  So, for example,
if you had two segments of 4G, you could not have them both resident at
the same time, and so you could not get a MOV instruction from one to
the other to complete.  (And the MOV instruction itself would have
to go somewhere.)

Thus, you can not actually reach the 2^45-byte addressing limit that
up to 2^13 segments of up to 2^32 bytes each implies.


Secondly, even if you do demand segmentation, bringing segments into
and out of the 32-bit LINEAR address space, this still requires that
the operating system rewrite the page tables (and invalidate the TLB entried)
in response to segment faults in order to access the relevant bits of 
PHYSICAL memory.

This is exactly the SAME operating system and hardware overhead as
using mmap or mremap to remap bits of a linear address space.  The only
difference would be if it were much easier for the user program to deal
with segments than to deal with explicit dynamic mmaps.  And it's not
at all clear that it is.


For these reasons, 32-bit x86 operating systems tend to ignore the
segmentation features and just use paging.  It just isn't worth the
complexity, and for multi-platform operating systems like Linux, it
isn't worth the portability hassles.  In fact, this has in turn led to
x86 designers de-emphasizing segment register loading speed, so large
model programs that use multiple segments take a significant speed hit.


Now, for why the Linux kernel takes 1 GB of virtual address space...

Every time a user-space program does a read() or write() call, or
makes any similar system call that moves a buffer of data, the kernel
has to copy between the user buffers and its own private file cache.

For this to be possible, the two source and destination buffers must
be in the same VIRTUAL address space.  And for it to be remotely efficient,
they have to be in the same LINEAR address space as well.

Now, it is possible to have a separate kernel address space, and demand-map
user-space buffers into it to do the copying.  That's what the 4G+4G patches
do.  But that means that on EVERY system call, you have to change the
page tables around, which results in flushing the TLB and a lot of
overhead.

The default Linux config arranges for the kernel's address space and the
user's address space to both be present at the same time.  Page table
entries have a permission bit that lets them be inaccessible to user
mode but accessed from kernel mode without having to reload the TLB.
This is very fast.  But it results in the classic split between 3G of
user address space and 1G of kernel address space.

It could be done different ways, but *any alternative would be much slower*
for typical programs that don't need more than 3G of address space.


The things that's causing a real problem is that common physical
memory sizes are approaching the 4G address space.  Thus, it's no
longer guaranteed that the 1G of kernel space is big enough to hold
all of physical memory, so kernel access to some parts of it has to be
bank-switched (the CONFIG_HIGHMEM options).  By careful design, this
has been kept reasonably fast, but there is overhead.

Because the kernel address space has to hold more than just RAM (in
particular, it also has to hold memory-mapped PCI devices like video
cards), if you have 1G of physical memory, the kernel will by default
only use 896M of it, leaving 128M of kernel address space for PCI devices.

A different user/kernel split can help there.  I use 2.75/1.25G on 1G RAM
machines, but if you use PAE or NX, the split has to be on a 1G boundary.


But these are all workarounds.  The real solution is to use a larger
virtual address space so that the original, efficient technique of mapping

Re: 32-bit memory limits IN DETAIL (Was: perspectives on 32 bit vs 64 bit)

Re: 32-bit memory limits IN DETAIL (Was: perspectives on 32 bit vs 64 bit)

Re: 32-bit memory limits IN DETAIL (Was: perspectives on 32 bit vs 64 bit)

Re: 32-bit memory limits IN DETAIL (Was: perspectives on 32 bit vs 64 bit)

32-bit memory limits IN DETAIL (Was: perspectives on 32 bit vs 64 bit)

5 matches

Site Navigation

Mail list logo

Footer information