Karena menarik maka saya posting di sini saja wawancara tersebut

Andrea's Acheivements
By Moshe Bar , Byte.com
Feb 29, 2000 (10:36 AM)
URL: http://www.byte.com/column/BYT20000229S0009 

The following few thousand words could loosely be defined as a transcript
of an interview with world-famous kernel hacker Andrea
Arcangeli. The interview took place in his home in Imola/Italy on January
12th, 2000. The most striking thing about Andrea's lab is the
total lack of books. Usually, computer geeks like to have their reference
books on hand. Things like system calls and regular expressions
are never missing in a geek's lab -- at least not in mine. 

Not so, Andrea. There was not a single book in his lab. Everything was
neat and ordered. Andrea is a 24 years old kernel wizard, on the
payroll of Linux-distributor SuSE. 

We are used to hackers living according to this simple program (compiles
correctly under all Linux versions):

/*  warning: works best without a girlfriend  */
/* compile without girlfriend.h */
main()
{
while (alive) {
work();
eat();
smp_lock();  //avoid SMP races
sleep();  
smp_unlock();
}
}

Again, Andrea is different than the standard clich of an honest-to-God
kernel hacker. He does have a girlfriend and also cares a lot
about other things, too. In his own words, if he wouldn't happen to make
money from writing tomorrow's Linux kernels, he could just as
well be playing guitar from morning to evening. Andrea lives with his
parents the small town of Imola in central Italy. Formula-1 fans
among you might know this city from its famous racing-track that regularly
hosts F-1 world-championships. 

Andrea was even nice enough to come to pick me up at the highway exit. His
labs has the following equipment:

A dual-CPU Alpha-based Compaq server with 512-MB RAM A dual PII no-name
server with 128-MB An elderly P133 test-bed A
modest printer That's all, folks.

Notice the lack of a router, which indicates that Andrea connects to the
Internet like most people in Italy; with a 56Kbit modem. No
frame relay, no ISDN, no DSL. He does however manage the Linux boxes at
his ISP's network and that lets him download stuff on the
ISP's server and then simply run rsync to update things on his
home-server. This way, he manages to reduce huge downloads to mere
5-minute stuff. A VM Man, Andrea Arcangeli is primarily known for his
contributions to the Linux 2.2. and the 2.3 virtual memory
managers, as well as for some world-class hacks on the scheduler,
interrupts handler, and the ext2 file system. His main hacks include the
one for 2.2 that makes it possible for a user-space program to address 4
GBs of RAM. He did this together with fellow-hacker Wichert
from Siemens in Germany. Asked how he technically achieved the feat,
Andrea answered:

"Yes that's what we did and that's the basic design that allowed IA32 to
break the old memory limit in 2.3.x (and in 2.2.x with my
additional patch with production quality). Basically you have 4giga of
physical RAM and 4giga of virtual ram:

------------------------------------------------------------
|  userspace virtual 0-3g  | kernelvirtual 3-4g |
------------------------------------------------------------

You know that in order to access physical memory we have to first setup
proper virt-to-phys mapping.

The kernelvirtual maps the 3-4g range to the 0-1 giga phys range as below
shown:

------------------------------------------------------------
| kernel-phys 0-1g | unused-phys-memory 1-4giga|
------------------------------------------------------------

and the userspace _virtual_ addresses (0-3 giga area) maps with 4-kbyte
(PAGE_SIZE) granularity all over the phys memory that the
kernel can address all depending on user page tables setting. Normal. 

Basically the limitation of the old 2.2.x design, which we broke cleanly
with our new design, is the kernel wasn't able to deal with memory
that couldn't be accessed via the ident mapping (the identity mapping is
the 3-4 giga virtual area (where the kernel runs) that points to
the 0-1giga physical memory). So, basically the kernel was always doing: 

*(virtual_address) = 0;
to write to the physical memory at address "virtual_address
-3 giga". 
------------------------------------------------------------
|  VIRTUAL SPACE  | kernel virt 3-4g |
------------------------------------------------------------
| virtual write here
-----------------------------------------
|
\|/ physical write happens here
------------------------------------------------------------
|  kernel phys  |  PHYSICAL SPACE  |
------------------------------------------------------------
"

"Basically, the problem was that the kernel wasn't able to access the
physical memory between 1 giga and 4 giga. As you can see from the
picture, the last address the kernel could access writing to the ident
mapping happens to be a 4 giga-1 and it points to the physical address
1 giga-1. Then the virtual addresses wraps around (due the 32-bit virtual
address limitation that belongs to all the IA32 family of CPUs,
PAE doesn't help virtual addresses at all). NOTE: we can't use the
userspace virtual space to write after the 1giga physical. This because
the userspace page tables belongs to the process memory layout and we
can't change them without clobbering the userspace mappings and
so without flushing away user tlb and so on.

What we did with our new design has been to let the kernel access the phys
memory after the kernel-phys area (so after 1 giga). To do
this cleanly, we reserved a pool of virtual page at the end of the virtual
memory (at around address 4 giga) and we put page tables on it, in
order to map them to the physical space after the 2 giga range. This
exactly how the 2.3.41 kernel supports more than 1 giga of RAM on
IA32.

The pool of virtual pages placed at around virtual address 4 giga, we can
now address all the physical space available from the kernel.
Actually in the above picture the physical space is large as the virtual
space (4g). With the new PAE support the code works exactly in the
same way, with the difference that with the 3-level-pagetables of the PAE
mode the physical space is larger than 4 giga and it is instead
64 giga. So, the above picture should be changed moving the end of the
physical space to address 64 giga and not 4 giga, that's the only
difference between PAE mode and non-PAE mode.

The userspace continues to point all over the physical space. But now the
physical space that the userspace can address via its page tables
is not limited anymore to 0-1giga of physical RAM, but it can address all
the physical memory exactly like our pool of virtual kernel pages
can do. This because the kernel is now able to deal with the physical
range after 1giga and so it can allow userspace to use it, too. 

Of course, as you can see from the picture, there's a performance penalty
in accessing the memory after 1 giga, this is because we have to
set up a virt-to-phys mapping and to flush away old tlb entries, but it's
very reasonable to have such performance penalty to handle more
memory. We measured the performance hit in the worst case (with maximal
page-faults rate) is 2 percent. In real life, it's not
measurable. To map/unmap the reserved virtual pages to access the memory
we created two helper functions called kmap() and
kunmap() that deals 'automagically' with the virt-to-phys mapping. The
code in 2.3.x that uses the reserved pool to access the memory
over 1 giga looks like this: 

page = get_page();
virtual_address = kmap(page);
*(virtual_address) = 0;
kunmap(page);"

Saying this hack is worth millions is not exaggerated. This hack will make
it possible for enterprise-wide servers, especially database
servers to manage huge amounts of data. Especially, after the recent
announcement of IBM's journaling file system for Linux, there is
really no reason to buy expensive proprietary hardware and software from
companies like Sun or Hewlett-Packard. Clearly, Linux is
going to dominate the server market. During the interview, Andrea, proved
more than once that he has a lexicographic knowledge of the
kernel's data structures and algorithm. Whenever asked about a certain
part of the Linux kernel, he could always produce the
corresponding code section with the more than 50 source-code files. Andrea
was more than happy to implement a new kernel feature
(CPU affinity for SMP kernels) with me. Making use of his vast knowledge
of the kernel-data structures and algorithms, the feature was
implemented in 20 minutes. 

There are, however, tons of other contributions that made it to the Linux
kernel, passing Linus Torvald's close scrutiny and harsh
criticism. Here are but the most important:

important bugfixes for VM, SMP races, TCPv4, filesystems (ext2, vfat),
buffer cache, IO-subsystem, timer, ipc-shm, IA32 architecture,
alpha architecture, scheduler, parport, lp, ppa, etc. broken the memory
limit of IA32 and alpha, to grow beyond 4GB of RAM SMP
threaded per-page LRU on 2.3.x VM improvements and research scheduler
improvements and research alpha architecture hacking
maintaining and developing IKD (Integrate Kernel Debugging) patch
schedule_timeout() implementation jiffy wrap fixes for robusteness
of 32bit architectures parport sharing

Andrea, at one point, said, "I can say that I spend most of my time fixing
bugs even if I have lots of new features to implement in mind,
but I give bugs more priority." This, of course, should be the attitude of
any developer, but more so for a kernel hacker. After a few hours
into the interview, Andrea's mother shows up and brings coffee and cookies
(Mmmmmm). Arcangeli is very proud of her son, but
remains modest about it. More than once I noticed how he starts to speak
faster when he talks kernel, but then seems more careful when
the subject shifts go more general issues. Just like Linus, Andrea, too,
rejects many of my suggestions for improvements of the VM, even
if the feature is to be found in successful OSes like Solaris. Asked about
hint-based schedulers (a feature of Solaris 7), Andrea said:

"I generally never look at the code of other OSs. If I believe in a
feature, first I think it out for a while and test it against all possible
scenarios, then I just sit down and code it. I am not interested in what
the others do."

The fact that his contributions have always proved to increase the
reliability and performance of Linux forgives this attitude. The open
source approach is clearly less academic than proprietary development. A
feature is introduced and if it serves well, it remains. If not, it
evolutes naturally until it is good -- a sort of trial-and-error method. 

When asked about his plans for future development, Andrea replied:"I want
to look at page coloring in the VM (an advanced paging
algorithm) and journaling file system. But I also have still lots of bug
hunting to do."

Andrea codes from around 10 a.m. to 6 p.m. Then, he either plays guitar or
goes out with his girlfriend and his other friends. His coding
style matches that of the Grand Master of kernel programming, Linus
Torvalds: elegant, terse code that is at times unconventional for
the sake of speed. And there are many gotos. In fact, in the Linux kernel
there is one goto for about 80 lines of code. Although all OS
kernels have to use gotos for the sake of efficiency, Linux has by far the
biggest share of gotos in the source code. As long as geniuses like
Andrea understand it, it's OK. 

You can reach Andrea by email at: [EMAIL PROTECTED] 

Moshe Bar is an Israeli system administrator and OS researcher, who
started learning Unix on a PDP-11 with AT&T Unix Release 6
back in 1981. He holds an M.Sc in computer science. Visit Moshe's website
at http://www.moelabs.com/

For more of Moshe's columns visit the Serving With Linux Index Page


===========================================================================
I Made Wiryana (0521-106 5328)            Universitas Gunadarma - Indonesia
Rechnernetze und Verteilte Systeme  http://nakula.rvs.uni-bielefeld.de/made
Universitaet Bielelfeld                                   Check my e-zine :
[EMAIL PROTECTED]    http://nakula.rvs.uni-bielefeld.de/majalah
Pendukung  Open Source Campus Agreement - legal, cerdik, mandiri dan hemat
===========================================================================


* Gunadarma Mailing List -----------------------------------------------
* Archives     : http://milis-archives.gunadarma.ac.id
* Langganan    : Kirim Email kosong ke [EMAIL PROTECTED]
* Berhenti     : Kirim Email kosong ke [EMAIL PROTECTED]
* Administrator: [EMAIL PROTECTED]

Kirim email ke