Re: Need help in understanding x86 syscall
> zach-dev2:~ $ ldd /bin/ls > linux-gate.so.1 => (0xe000) > > This is the vsyscall entry point, which gets linked by ld into all processes. Just a clarification... not GNU ld (the binutils thing), but /lib/ld-linux.so - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 11 Aug 2005, Zwane Mwaikambo wrote: > On Thu, 11 Aug 2005, Steven Rostedt wrote: > > On Thu, 2005-08-11 at 13:46 -0400, linux-os (Dick Johnson) wrote: > > int is a call to either an interrupt or exception procedure. 0x80 is > > setup in Linux to be a trap and not an interrupt vector. So it does > > _not_ turn off interrupts. > > It's actually a vector, that's all you can install in the IDT. It's a vector + metadata, most noticably a privilege level and a descriptor type. http://www.acm.uiuc.edu/sigops/roll_your_own/i386/idt.html -- Top 100 things you don't want the sysadmin to say: 47. Say, What does "Superblock Error" mean, anyhow? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 22:04 -0700, Jeff Carr wrote: > [EMAIL PROTECTED]:~# dpkg -s libc6-i686 > ... OK, this explains it :-) # dpkg -s libc-i686 Package `libc-i686' is not installed and no info is available. # dpkg -s libc6 Package: libc6 Status: install ok installed Priority: required Section: base Installed-Size: 16336 Maintainer: GNU Libc Maintainers Architecture: i386 Source: glibc Version: 2.3.5-3 Replaces: ldso (<= 1.9.11-9), timezone, timezones, gconv-modules, libtricks, libc6-bin, netkit-rpc, netbase (<< 4.0) Provides: glibc-2.3.5-3 Suggests: locales, glibc-doc Conflicts: strace (<< 4.0-0), libnss-db (<= 2.2-6.1.1), timezone, timezones, gconv-modules, libtricks, libc6-doc, libc5 (<< 5.4.33-7), libpthread0 (<< 0.7-10), libc6-bin, libwcsmbs, apt (<< 0.3.0), libglib1.2 (<< 1.2.1-2), netkit-rpc, wine (<< 0.0.20031118-1), cyrus-imapd (<< 1.5.19-15), initrd-tools (<< 0.1.79) Description: GNU C Library: Shared libraries and Timezone data Contains the standard libraries that are used by nearly all programs on the system. This package includes shared versions of the standard C library and the standard math library, as well as many others. Timezone data is also included. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 22:04 -0700, Jeff Carr wrote: > But are you using libc6-i686? That enables NPTL. Perhaps the behavior > difference is there? I'm surprised int 80 doesn't really cause an > interrupt; it doesn't jump to the appropriate place in the x86 vector > table? Interesting. int 80 does jump to the appropriate place in the vector table. In arch/i386/kernel/traps.c: init_traps we have the line: set_system_gate(SYSCALL_VECTOR,_call); Which sets up a trap gate in the vector table to jump to system_call upon an "int 80", and this is exactly what I see. It does not, however, jump to sysenter_entry. That would happen when sysenter is used instead of "int 80". When I use to work with a bunch of hardware folks, they would get mad at me when I said a system call was initiated with an interrupt. They always told me that an interrupt was from an external source. Anything that the CPU causes itself (system call, page fault, etc) is called an exception, or trap. So I would try to use those definitions from then on. As a software guy though, I thought of them as the same thing. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 22:04 -0700, Jeff Carr wrote: But are you using libc6-i686? That enables NPTL. Perhaps the behavior difference is there? I'm surprised int 80 doesn't really cause an interrupt; it doesn't jump to the appropriate place in the x86 vector table? Interesting. int 80 does jump to the appropriate place in the vector table. In arch/i386/kernel/traps.c: init_traps we have the line: set_system_gate(SYSCALL_VECTOR,system_call); Which sets up a trap gate in the vector table to jump to system_call upon an int 80, and this is exactly what I see. It does not, however, jump to sysenter_entry. That would happen when sysenter is used instead of int 80. When I use to work with a bunch of hardware folks, they would get mad at me when I said a system call was initiated with an interrupt. They always told me that an interrupt was from an external source. Anything that the CPU causes itself (system call, page fault, etc) is called an exception, or trap. So I would try to use those definitions from then on. As a software guy though, I thought of them as the same thing. -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 22:04 -0700, Jeff Carr wrote: [EMAIL PROTECTED]:~# dpkg -s libc6-i686 ... OK, this explains it :-) # dpkg -s libc-i686 Package `libc-i686' is not installed and no info is available. # dpkg -s libc6 Package: libc6 Status: install ok installed Priority: required Section: base Installed-Size: 16336 Maintainer: GNU Libc Maintainers debian-glibc@lists.debian.org Architecture: i386 Source: glibc Version: 2.3.5-3 Replaces: ldso (= 1.9.11-9), timezone, timezones, gconv-modules, libtricks, libc6-bin, netkit-rpc, netbase ( 4.0) Provides: glibc-2.3.5-3 Suggests: locales, glibc-doc Conflicts: strace ( 4.0-0), libnss-db (= 2.2-6.1.1), timezone, timezones, gconv-modules, libtricks, libc6-doc, libc5 ( 5.4.33-7), libpthread0 ( 0.7-10), libc6-bin, libwcsmbs, apt ( 0.3.0), libglib1.2 ( 1.2.1-2), netkit-rpc, wine ( 0.0.20031118-1), cyrus-imapd ( 1.5.19-15), initrd-tools ( 0.1.79) Description: GNU C Library: Shared libraries and Timezone data Contains the standard libraries that are used by nearly all programs on the system. This package includes shared versions of the standard C library and the standard math library, as well as many others. Timezone data is also included. -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 11 Aug 2005, Zwane Mwaikambo wrote: On Thu, 11 Aug 2005, Steven Rostedt wrote: On Thu, 2005-08-11 at 13:46 -0400, linux-os (Dick Johnson) wrote: int is a call to either an interrupt or exception procedure. 0x80 is setup in Linux to be a trap and not an interrupt vector. So it does _not_ turn off interrupts. It's actually a vector, that's all you can install in the IDT. It's a vector + metadata, most noticably a privilege level and a descriptor type. http://www.acm.uiuc.edu/sigops/roll_your_own/i386/idt.html -- Top 100 things you don't want the sysadmin to say: 47. Say, What does Superblock Error mean, anyhow? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
zach-dev2:~ $ ldd /bin/ls linux-gate.so.1 = (0xe000) This is the vsyscall entry point, which gets linked by ld into all processes. Just a clarification... not GNU ld (the binutils thing), but /lib/ld-linux.so - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On 8/12/05, Jeff Carr <[EMAIL PROTECTED]> wrote: > On 08/11/2005 10:18 AM, Steven Rostedt wrote: > > > It's vanilla 2.6.12-rc3 + Ingo's RT V0.7.46-02-rs-0.4 + some of my own > > customizations. But I never touched the sysentry stuff and with a few > > printks I see it is being initialized. > > > >>Also glibc support. > > > > I'm using Debian unstable with a recent (last week) update. > > > > -- Steve > > But are you using libc6-i686? That enables NPTL. Perhaps the behavior > difference is there? I'm surprised int 80 doesn't really cause an > interrupt; it doesn't jump to the appropriate place in the x86 vector > table? Interesting. > > Jeff > > > [EMAIL PROTECTED]:~# dpkg -s libc6-i686 > ... > This set of libraries is optimized for i686 machines, and will only be > used if you are running a 2.6 kernel on an i686 class CPU (check the > output of `uname -m'). This includes Pentium Pro, Pentium II/III/IV, > Celeron CPU's and similar class CPU's (including clones such as AMD > Athlon/Opteron, VIA C3 Nehemiah, but not VIA C3 Ezla). > . > This package includes support for NPTL. > . Even with libc6-i686 installed, I can't see sysenter got used. libc6-i686 has /lib/tls/i686/cmov/libc.so.6, not the one /lib/libc-2.3.5.so. mozilla gets: Illegal instruction I've added ud2 in both entry.S and vsyscall-sysenter.S. Any ideas? -- Coywolf Qi Hunt http://ahbl.org/~coywolf/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On 08/11/2005 10:18 AM, Steven Rostedt wrote: > It's vanilla 2.6.12-rc3 + Ingo's RT V0.7.46-02-rs-0.4 + some of my own > customizations. But I never touched the sysentry stuff and with a few > printks I see it is being initialized. > >>Also glibc support. > > I'm using Debian unstable with a recent (last week) update. > > -- Steve But are you using libc6-i686? That enables NPTL. Perhaps the behavior difference is there? I'm surprised int 80 doesn't really cause an interrupt; it doesn't jump to the appropriate place in the x86 vector table? Interesting. Jeff [EMAIL PROTECTED]:~# dpkg -s libc6-i686 ... This set of libraries is optimized for i686 machines, and will only be used if you are running a 2.6 kernel on an i686 class CPU (check the output of `uname -m'). This includes Pentium Pro, Pentium II/III/IV, Celeron CPU's and similar class CPU's (including clones such as AMD Athlon/Opteron, VIA C3 Nehemiah, but not VIA C3 Ezla). . This package includes support for NPTL. . - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 11 Aug 2005, Steven Rostedt wrote: > On Thu, 2005-08-11 at 15:41 +0200, Bodo Eggert wrote: > > According to my documentation it isn't. A software interrupt is a far call > > with an extra pushf, and a hardware interrupt is protected against recursion > > by the PIC, not by an interrupt flag. > > I disagree with your definition of a system call. The "int 0x80" > changes from user mode to kernel mode so it is much more powerful than a > "far call". Far calls and jumps can change to a inner ring. This is done by a special segment selector containing the segment _and_ the offset to jump to (the offset from the call instruction is ignored). > Also the CPU does protect against recursion and more than > one interrupt coming in at the same time. The PIC also works with the > CPU in this regard, but as I shown in my previous email, the interrupt > flag _does_ protect against it. Showing == claiming? However, my documentation was wrong. http://www.baldwin.cx/386htm/INT.htm -- Top 100 things you don't want the sysadmin to say: 99. Shit!! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 11 Aug 2005, Steven Rostedt wrote: > On Thu, 2005-08-11 at 13:46 -0400, linux-os (Dick Johnson) wrote: > > > > > I was talking about the one who had the glibc support to use > > the newer system-call entry (who's name can confuse). > > > > You are looking at code that uses int 0x80. It's an interrupt, > > therefore, in the kernel, once the stack is set up, interrupts > > need to be (re)enabled. > > int is a call to either an interrupt or exception procedure. 0x80 is > setup in Linux to be a trap and not an interrupt vector. So it does > _not_ turn off interrupts. It's actually a vector, that's all you can install in the IDT. Also a trap doesn't advance the instruction pointer, so you resume at the trapping instruction (e.g. vector 14/page fault), 0x80 is an interrupt gate. One of the distinguishing differences is that 0x80 may be entered via int 0x80 from all ring levels. The reason why int 0x80 doesn't disable interrupts is because issuing int 0x80 directly is similar to doing a far call and therefore doesn't have the same effect as a real interrupt being issued. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 12:58 -0700, Zachary Amsden wrote: > If you're feeling really masochistic, I've added a demonstration of how > you can call sysenter from userspace without glibc. Thanks Zach, this will give me something to play around with when I have a little more spare time >8-} -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
Steven Rostedt wrote: OK, I get the same on my machine. On a machine that does not support sysenter, this will give you: int $0x80 ret The int $0x80 system calls are still fully supported by a sysenter capable kernel, since it must run older binaries and potentially support syscalls during early boot up before it is known that sysenter is supported. Now is the latest glibc using this. Since I put in a ud2 op in my sysenter_entry code, which is not triggered, as well as an objdump of libc.so shows a bunch of int 0x80 calls. The NPTL version of glibc (the TLS library) uses this. zach-dev2:~ $ ldd /bin/ls linux-gate.so.1 => (0xe000) librt.so.1 => /lib/tls/librt.so.1 (0x4002e000) libacl.so.1 => /lib/libacl.so.1 (0x40038000) libselinux.so.1 => /lib/libselinux.so.1 (0x4003e000) -->libc.so.6 => /lib/tls/libc.so.6 (0x4004c000) libpthread.so.0 => /lib/tls/libpthread.so.0 (0x40162000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x4000) libattr.so.1 => /lib/libattr.so.1 (0x40174000) You'll find getpid much faster with TLS libraries (it's cached, no longer a system call): With TLS: zach-dev2:Micro-bench $ time ./getpid real0m0.080s user0m0.080s sys 0m0.000s Without TLS: zach-dev:Micro-bench $ time ./getpid real 0m5.041s user 0m2.520s sys0m2.520s If you're feeling really masochistic, I've added a demonstration of how you can call sysenter from userspace without glibc. The code verifies that there is no way to exploit the kernel to achieve reading arbitrary memory through a non-flat data segment. It deliberately segfaults at the end. Let me point out this is a very wrong way to do things - you should always use the vsyscall page, and in fact, this code actually depends on the vsyscall page even if it is not apparent. I fake the same frame structure that the vsyscall page would have pushed to simulate a vsyscall entry, but the kernel will always return to the vsyscall page, which then returns back to us. Fun stuff. If you leave the kernel hack for ud2 in your kernel, I would expect it to blow up in amazing fashion when running the code below. zach-dev2:~ $ gcc sysenter.S sysenter.c -o sys sysenter.c: In function `main': sysenter.c:34: warning: passing arg 2 of `signal' from incompatible pointer type sysenter.c:49: warning: passing arg 3 of `sysenter_call_2' makes pointer from in teger without a cast sysenter.c:22: warning: return type of `main' is not `int' zach-dev2:~ $ ./sys interrupted %ebp = 0xbaadf00d phew Segmentation fault (core dumped) zach-dev2:~ $ gdb sys core GNU gdb 6.2.1 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i586-suse-linux"...Using host libthread_db library " /lib/tls/libthread_db.so.1". Core was generated by `./sys'. Program terminated with signal 11, Segmentation fault. warning: current_sos: Can't read pathname for load map: Input/output error Reading symbols from /lib/tls/libc.so.6...done. Loaded symbols for /lib/tls/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 #0 0xe410 in ?? () (gdb) print $eax $1 = -14 (gdb) #define EFAULT 14 /* Bad address */ int main(int argc, char *argv[]) { int j; for (j = 0; j < 100; j++) { getpid(); getpid(); getpid(); getpid(); getpid(); getpid(); getpid(); getpid(); getpid(); getpid(); } } #include .text .global sysenter_call .global sysenter_call_2 /* void sysenter_call(pid_t pid, int signo, short ds, void *addr) */ sysenter_call: push %ebx push %edi push %ebp push %ds movl %esp, %edi movl 20(%esp), %ebx /* pid */ movl 24(%esp), %ecx /* signo */ movl 28(%esp), %ds/* exploit DS */ movl 32(%esp), %ebp movl %ebp, %esp push $sysenter_return push %ecx push %edx subl $16, %ebp push $0xbaadf00d movl $SYS_kill, %eax sysenter /* vsyscall page will ret to us here */ sysenter_return: mov %edi, %esp pop %ds pop %ebp pop %edi pop %ebx ret sysenter_call_2: push %ebx push %ebp movl 12(%esp), %ebx /* pid */ movl 16(%esp), %ecx /* signo */ movl 20(%esp), %ebp movl $SYS_kill, %eax sysenter .data test: .long 0 #include #include #include #include #include #include #include #define __KERNEL__ #include extern void sysenter_call(pid_t pid, int signo, short ds, void *addr); extern void sysenter_call_2(pid_t pid, int signo, void *addr); void catch_sig(int signo, struct sigcontext ctx) {
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 14:21 -0400, linux-os (Dick Johnson) wrote: > I'm not sure you can stop the CPU from clearing the interrupt > bit in EFLAGS if you execute an interrupt. The interrupt handler > may be supported by a trap-gate, but the event has already > occurred. The documentation I have isn't clear on this at all. >From the Intel document 25366513 "IA32 Intel Architecture's Software Developer's Manual" Volume 1, page 145 (or 6-11) Section "Call and Return Operation for Interrupt or Exception Handling Procedures". A call to an interrupt or exception handler procedure is similar to a procedure call to another protection level (see Section 6.3.6., "CALL and RET Operation Between Privilege Levels"). Here, the interrupt vector references one of two kinds of gates: an interrupt gate or a trap gate. Interrupt and trap gates are similar to call gates in that they provide the following information: · access rights information · the segment selector for the code segment that contains the handler procedure · an offset into the code segment to the first instruction of the handler procedure The difference between an interrupt gate and a trap gate is as follows. If an interrupt or exception handler is called through an interrupt gate, the processor clears the interrupt enable (IF) flag in the EFLAGS register to prevent subsequent interrupts from interfering with the execution of the handler. When a handler is called through a trap gate, the state of the IF flag is not changed. And in linux, the system call vector is handled with a trap gate, and thus that is why the system_call in entry.S does not call sti. Although, you are right, if I use sysenter, then it would call sysenter_entry where it would need to enable interrupts again. To prove my point. All the libc syscalls seem to use "int 0x80", and looking at the entry.S, it calls system_call directly. Now to see what sysenter would do I did the following changes: Index: arch/i386/kernel/entry.S === --- arch/i386/kernel/entry.S(revision 274) +++ arch/i386/kernel/entry.S(working copy) @@ -196,6 +196,8 @@ * Careful about security. */ cmpl $__PAGE_OFFSET-3,%ebp + call sdr_func + jmp syscall_fault jae syscall_fault 1: movl (%ebp),%ebp .section __ex_table,"a" Index: arch/i386/kernel/traps.c === --- arch/i386/kernel/traps.c(revision 274) +++ arch/i386/kernel/traps.c(working copy) @@ -1092,6 +1092,10 @@ } while (0) +void sdr_func(void) +{ + printk("hello from sdr_func\n"); +} So my sysenter_entry in entry.S would call my function sdr_func which is defined in traps.c as above. Then I ran the following program: int main() { unsigned long a = 0x14; asm("push %%ecx;\npush %%edx;\nmov %%esp,%%ebp;\nsysenter" ::"a"(a):"cx","dx","sp","bp"); return 0; } And I did get my print in the console. So it seems that my system does not use sysenter (even though the linux-gate.so seems to set this up), but instead uses the "int 0x80", which in Linux does _not_ disable interrupts. I hope this clears things up for everyone. :-) -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 11 Aug 2005, Steven Rostedt wrote: > On Thu, 2005-08-11 at 13:46 -0400, linux-os (Dick Johnson) wrote: > >> >> I was talking about the one who had the glibc support to use >> the newer system-call entry (who's name can confuse). >> >> You are looking at code that uses int 0x80. It's an interrupt, >> therefore, in the kernel, once the stack is set up, interrupts >> need to be (re)enabled. > > int is a call to either an interrupt or exception procedure. 0x80 is > setup in Linux to be a trap and not an interrupt vector. So it does > _not_ turn off interrupts. > I'm not sure you can stop the CPU from clearing the interrupt bit in EFLAGS if you execute an interrupt. The interrupt handler may be supported by a trap-gate, but the event has already occurred. The documentation I have isn't clear on this at all. > I'm looking at the sysenter code which is suppose to be the fast entry > into the system, and it looks like it is suppose to call the > sysenter_entry when used. I'm trying to write something to test this > out, since I still have the ud2 op in my sysentry code. So if I do get > this to work, I can cause a bug. > > -- Steve > Cheers, Dick Johnson Penguin : Linux version 2.6.12 on an i686 machine (5537.79 BogoMips). Warning : 98.36% of all statistics are fiction. . I apologize for the following. I tried to kill it with the above dot : The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [EMAIL PROTECTED] - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 10:59 -0700, Zachary Amsden wrote: > > zach-dev2:~ $ ldd /bin/ls > linux-gate.so.1 => (0xe000) OHHH! So THAT is what linux-gate is used for! Thanks, I've been really confused by that. > > This is the vsyscall entry point, which gets linked by ld into all > processes. It is a kernel page which is visible to user space, and is > rewritten to support sysenter if indeed that instruction is available. > Glibc has fixed entry points to this page. Here is a view of the system > call entry point on a machine which supports sysenter: > > (gdb) break _init > Breakpoint 1 at 0x8049522 > (gdb) run > Starting program: /bin/ls > (no debugging symbols found)...[Thread debugging using libthread_db enabled] > [New Thread 1075283616 (LWP 5328)] > [Switching to Thread 1075283616 (LWP 5328)] > > Breakpoint 1, 0x08049522 in _init () > (gdb) x/10i 0xe400 > 0xe400: push %ecx > 0xe401: push %edx > 0xe402: push %ebp > 0xe403: mov%esp,%ebp > 0xe405: sysenter > 0xe407: nop > 0xe408: nop > 0xe409: nop > 0xe40a: nop > 0xe40b: nop > OK, I get the same on my machine. > On a machine that does not support sysenter, this will give you: > > int $0x80 > ret > > The int $0x80 system calls are still fully supported by a sysenter > capable kernel, since it must run older binaries and potentially support > syscalls during early boot up before it is known that sysenter is supported. Now is the latest glibc using this. Since I put in a ud2 op in my sysenter_entry code, which is not triggered, as well as an objdump of libc.so shows a bunch of int 0x80 calls. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 13:46 -0400, linux-os (Dick Johnson) wrote: > > I was talking about the one who had the glibc support to use > the newer system-call entry (who's name can confuse). > > You are looking at code that uses int 0x80. It's an interrupt, > therefore, in the kernel, once the stack is set up, interrupts > need to be (re)enabled. int is a call to either an interrupt or exception procedure. 0x80 is setup in Linux to be a trap and not an interrupt vector. So it does _not_ turn off interrupts. I'm looking at the sysenter code which is suppose to be the fast entry into the system, and it looks like it is suppose to call the sysenter_entry when used. I'm trying to write something to test this out, since I still have the ud2 op in my sysentry code. So if I do get this to work, I can cause a bug. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
Steven Rostedt wrote: I expect that if I had a Gentoo system that I compiled for my machine, this would be different. But I suspect that Debian still wants to run on my old Pentium 75MHz laptop. How would libc know to use sysenter instead of int 0x80. It could do a test of the system, but would there be an if statement for every system call then? I guess that libc needs to be compiled either to use it or not. Since there are still several machines out there that don't have this feature, it would be safer to not use it. zach-dev2:~ $ ldd /bin/ls linux-gate.so.1 => (0xe000) This is the vsyscall entry point, which gets linked by ld into all processes. It is a kernel page which is visible to user space, and is rewritten to support sysenter if indeed that instruction is available. Glibc has fixed entry points to this page. Here is a view of the system call entry point on a machine which supports sysenter: (gdb) break _init Breakpoint 1 at 0x8049522 (gdb) run Starting program: /bin/ls (no debugging symbols found)...[Thread debugging using libthread_db enabled] [New Thread 1075283616 (LWP 5328)] [Switching to Thread 1075283616 (LWP 5328)] Breakpoint 1, 0x08049522 in _init () (gdb) x/10i 0xe400 0xe400: push %ecx 0xe401: push %edx 0xe402: push %ebp 0xe403: mov%esp,%ebp 0xe405: sysenter 0xe407: nop 0xe408: nop 0xe409: nop 0xe40a: nop 0xe40b: nop On a machine that does not support sysenter, this will give you: int $0x80 ret The int $0x80 system calls are still fully supported by a sysenter capable kernel, since it must run older binaries and potentially support syscalls during early boot up before it is known that sysenter is supported. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 11 Aug 2005, Steven Rostedt wrote: > On Thu, 2005-08-11 at 13:26 -0400, Steven Rostedt wrote: > >> 288fb seems to use "int 0x80" and so do all the other system calls that >> I inspected. > > I expect that if I had a Gentoo system that I compiled for my machine, > this would be different. But I suspect that Debian still wants to run on > my old Pentium 75MHz laptop. How would libc know to use sysenter > instead of int 0x80. It could do a test of the system, but would there > be an if statement for every system call then? I guess that libc needs > to be compiled either to use it or not. Since there are still several > machines out there that don't have this feature, it would be safer to > not use it. > > -- Steve > Well I have a small-C runtime library that I put together for imbedded systems. Once somebody heard that I was using the "obsolete" int 0x80, they insisted that I re-do everything to use the new interface. Since I wasn't getting paid to think on that project, I did what I was told. Bench-marks to 'getpid()' showed the 0x80 interrupt faster by a few cycles so the "suits" claimed that I must have done something wrong. So we had a "code-review". Finally it was decided; "The CPU must be handling things differently..." i.e., go back to the simpler int 0x80 interface. It was obvious to me that any difference in speed was simply noise. Both ways are essentially the same for performance so I wouldn't lose any sleep over an "older" 'C' runtime library. Cheers, Dick Johnson Penguin : Linux version 2.6.12 on an i686 machine (5537.79 BogoMips). Warning : 98.36% of all statistics are fiction. . I apologize for the following. I tried to kill it with the above dot : The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [EMAIL PROTECTED] - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 11 Aug 2005, Steven Rostedt wrote: > On Thu, 2005-08-11 at 13:10 -0400, linux-os (Dick Johnson) wrote: >> On Thu, 11 Aug 2005, Coywolf Qi Hunt wrote: > >>> >>> Also glibc support. >>> >>> -- >>> Coywolf Qi Hunt >>> http://ahbl.org/~coywolf/ >> >> Probably doesn't use int 0x80 at all. > > $ objdump -Dhalpr /lib/libc.so.6 | egrep 'int *\$0x80' | wc > 4482240 20160 > > And a little snapshot: > > 000288d0 <__libc_sigsuspend>: > 288d0: 55 push %ebp > 288d1: 89 e5 mov%esp,%ebp > 288d3: 57 push %edi > 288d4: 56 push %esi > 288d5: 53 push %ebx > 288d6: e8 00 00 00 00 call 288db <__libc_sigsuspend+0xb> > 288db: 5b pop%ebx > 288dc: 81 c3 19 c7 0e 00 add$0xec719,%ebx > 288e2: 8b 83 b4 32 00 00 mov0x32b4(%ebx),%eax > 288e8: 85 c0 test %eax,%eax > 288ea: 75 23 jne2890f <__libc_sigsuspend+0x3f> > 288ec: b9 08 00 00 00 mov$0x8,%ecx > 288f1: 8b 55 08mov0x8(%ebp),%edx > 288f4: 87 d3 xchg %edx,%ebx > 288f6: b8 b3 00 00 00 mov$0xb3,%eax > 288fb: cd 80 int$0x80 > 288fd: 87 d3 xchg %edx,%ebx > 288ff: 89 c6 mov%eax,%esi > 28901: 3d 00 f0 ff ff cmp$0xf000,%eax > 28906: 77 33 ja 2893b <__libc_sigsuspend+0x6b> > 28908: 89 f0 mov%esi,%eax > 2890a: 5b pop%ebx > 2890b: 5e pop%esi > 2890c: 5f pop%edi > 2890d: 5d pop%ebp > 2890e: c3 ret > > 288fb seems to use "int 0x80" and so do all the other system calls that > I inspected. > > $ ls -l /lib/libc.so.6 > lrwxrwxrwx 1 root root 13 2005-08-09 22:28 /lib/libc.so.6 -> libc-2.3.5.so > > > -- Steve > I was talking about the one who had the glibc support to use the newer system-call entry (who's name can confuse). You are looking at code that uses int 0x80. It's an interrupt, therefore, in the kernel, once the stack is set up, interrupts need to be (re)enabled. Cheers, Dick Johnson Penguin : Linux version 2.6.12 on an i686 machine (5537.79 BogoMips). Warning : 98.36% of all statistics are fiction. . I apologize for the following. I tried to kill it with the above dot : The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [EMAIL PROTECTED] - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 13:26 -0400, Steven Rostedt wrote: > 288fb seems to use "int 0x80" and so do all the other system calls that > I inspected. I expect that if I had a Gentoo system that I compiled for my machine, this would be different. But I suspect that Debian still wants to run on my old Pentium 75MHz laptop. How would libc know to use sysenter instead of int 0x80. It could do a test of the system, but would there be an if statement for every system call then? I guess that libc needs to be compiled either to use it or not. Since there are still several machines out there that don't have this feature, it would be safer to not use it. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 13:10 -0400, linux-os (Dick Johnson) wrote: > On Thu, 11 Aug 2005, Coywolf Qi Hunt wrote: > >> > > > > Also glibc support. > > > > -- > > Coywolf Qi Hunt > > http://ahbl.org/~coywolf/ > > Probably doesn't use int 0x80 at all. $ objdump -Dhalpr /lib/libc.so.6 | egrep 'int *\$0x80' | wc 4482240 20160 And a little snapshot: 000288d0 <__libc_sigsuspend>: 288d0: 55 push %ebp 288d1: 89 e5 mov%esp,%ebp 288d3: 57 push %edi 288d4: 56 push %esi 288d5: 53 push %ebx 288d6: e8 00 00 00 00 call 288db <__libc_sigsuspend+0xb> 288db: 5b pop%ebx 288dc: 81 c3 19 c7 0e 00 add$0xec719,%ebx 288e2: 8b 83 b4 32 00 00 mov0x32b4(%ebx),%eax 288e8: 85 c0 test %eax,%eax 288ea: 75 23 jne2890f <__libc_sigsuspend+0x3f> 288ec: b9 08 00 00 00 mov$0x8,%ecx 288f1: 8b 55 08mov0x8(%ebp),%edx 288f4: 87 d3 xchg %edx,%ebx 288f6: b8 b3 00 00 00 mov$0xb3,%eax 288fb: cd 80 int$0x80 288fd: 87 d3 xchg %edx,%ebx 288ff: 89 c6 mov%eax,%esi 28901: 3d 00 f0 ff ff cmp$0xf000,%eax 28906: 77 33 ja 2893b <__libc_sigsuspend+0x6b> 28908: 89 f0 mov%esi,%eax 2890a: 5b pop%ebx 2890b: 5e pop%esi 2890c: 5f pop%edi 2890d: 5d pop%ebp 2890e: c3 ret 288fb seems to use "int 0x80" and so do all the other system calls that I inspected. $ ls -l /lib/libc.so.6 lrwxrwxrwx 1 root root 13 2005-08-09 22:28 /lib/libc.so.6 -> libc-2.3.5.so -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Fri, 2005-08-12 at 00:59 +0800, Coywolf Qi Hunt wrote: > On 8/12/05, Coywolf Qi Hunt <[EMAIL PROTECTED]> wrote: > > On 8/12/05, Steven Rostedt <[EMAIL PROTECTED]> wrote: > > > > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > > The cpu does have sep. Is it vanilla kernel? > > It's vanilla 2.6.12-rc3 + Ingo's RT V0.7.46-02-rs-0.4 + some of my own customizations. But I never touched the sysentry stuff and with a few printks I see it is being initialized. > > Also glibc support. > I'm using Debian unstable with a recent (last week) update. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 11 Aug 2005, Coywolf Qi Hunt wrote: > On 8/12/05, Coywolf Qi Hunt <[EMAIL PROTECTED]> wrote: >> On 8/12/05, Steven Rostedt <[EMAIL PROTECTED]> wrote: >>> On Thu, 2005-08-11 at 11:51 -0400, Steven Rostedt wrote: And booted it. The system is up and running, so I really don't think that the sysenter_entry is used for system calls. Not so "Clear as day"! >>> >>> And so, looking into sysenter_entry, it seems that my configurations >>> don't seem to use it. This jumps straight to system_call without ever >>> having to turn interrupts on. >>> >>> # cat /proc/cpuinfo >>> processor : 0 >>> vendor_id : GenuineIntel >>> cpu family : 6 >>> model : 8 >>> model name : Pentium III (Coppermine) >>> stepping: 3 >>> cpu MHz : 367.939 >>> cache size : 256 KB >>> fdiv_bug: no >>> hlt_bug : no >>> f00f_bug: no >>> coma_bug: no >>> fpu : yes >>> fpu_exception : yes >>> cpuid level : 2 >>> wp : yes >>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge >>> mca cmov pat pse36 mmx fxsr sse >>> bogomips: 722.94 >>> >>> >>> -- Steve >>> >> >> The cpu does have sep. Is it vanilla kernel? >> > > Also glibc support. > > -- > Coywolf Qi Hunt > http://ahbl.org/~coywolf/ Probably doesn't use int 0x80 at all. Cheers, Dick Johnson Penguin : Linux version 2.6.12 on an i686 machine (5537.79 BogoMips). Warning : 98.36% of all statistics are fiction. . I apologize for the following. I tried to kill it with the above dot : The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [EMAIL PROTECTED] - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On 8/12/05, Coywolf Qi Hunt <[EMAIL PROTECTED]> wrote: > On 8/12/05, Steven Rostedt <[EMAIL PROTECTED]> wrote: > > On Thu, 2005-08-11 at 11:51 -0400, Steven Rostedt wrote: > > > > > > And booted it. The system is up and running, so I really don't think > > > that the sysenter_entry is used for system calls. > > > > > > Not so "Clear as day"! > > > > And so, looking into sysenter_entry, it seems that my configurations > > don't seem to use it. This jumps straight to system_call without ever > > having to turn interrupts on. > > > > # cat /proc/cpuinfo > > processor : 0 > > vendor_id : GenuineIntel > > cpu family : 6 > > model : 8 > > model name : Pentium III (Coppermine) > > stepping: 3 > > cpu MHz : 367.939 > > cache size : 256 KB > > fdiv_bug: no > > hlt_bug : no > > f00f_bug: no > > coma_bug: no > > fpu : yes > > fpu_exception : yes > > cpuid level : 2 > > wp : yes > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > > mca cmov pat pse36 mmx fxsr sse > > bogomips: 722.94 > > > > > > -- Steve > > > > The cpu does have sep. Is it vanilla kernel? > Also glibc support. -- Coywolf Qi Hunt http://ahbl.org/~coywolf/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On 8/12/05, Steven Rostedt <[EMAIL PROTECTED]> wrote: > On Thu, 2005-08-11 at 11:51 -0400, Steven Rostedt wrote: > > > > And booted it. The system is up and running, so I really don't think > > that the sysenter_entry is used for system calls. > > > > Not so "Clear as day"! > > And so, looking into sysenter_entry, it seems that my configurations > don't seem to use it. This jumps straight to system_call without ever > having to turn interrupts on. > > # cat /proc/cpuinfo > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 8 > model name : Pentium III (Coppermine) > stepping: 3 > cpu MHz : 367.939 > cache size : 256 KB > fdiv_bug: no > hlt_bug : no > f00f_bug: no > coma_bug: no > fpu : yes > fpu_exception : yes > cpuid level : 2 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > mca cmov pat pse36 mmx fxsr sse > bogomips: 722.94 > > > -- Steve > The cpu does have sep. Is it vanilla kernel? -- Coywolf Qi Hunt http://ahbl.org/~coywolf/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 11:51 -0400, Steven Rostedt wrote: > > And booted it. The system is up and running, so I really don't think > that the sysenter_entry is used for system calls. > > Not so "Clear as day"! And so, looking into sysenter_entry, it seems that my configurations don't seem to use it. This jumps straight to system_call without ever having to turn interrupts on. # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 8 model name : Pentium III (Coppermine) stepping: 3 cpu MHz : 367.939 cache size : 256 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips: 722.94 -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
Ukil a wrote: I had this question. As per my understanding, in the Linux system call implementation on x86 architecture the call flows like this int 0x80 -> syscall -> sys_call_vector(taken from the table)-> return from interrupt service routine. Almost. There are two entry points, the one you describe above, and the sysenter entry point. Now I had the doubt that if the the syscall implementation is very large will the scheduling and other interrupts be blocked for the whole time till the process returns from the ISR (because in an ISR by default the interrupts are disabled unless “sti” is called explicitly)? That’s appears to be too long for the scheduling or other interrupts to be blocked? Am I missing something here? There are 3 types of gates you can use to service interrupts / faults on i386. Task gates are used where complex state changes are required, and an assured state is needed, such as doublefault and NMI handlers. Interrupt gates are used where interrupts must be disabled during initial processing, such as the page fault gate. Trap gates are used when interrupts may be allowed, and do not clear the interrupt flag. On Linux, syscall vector int 0x80 is a trap gate, which means interrupts are not disabled. The sysenter handler is very special; SYSENTER does disable interrupts, so if you look at sysenter_entry, one of the first things it will do is re-enable interrupts as soon as the stack is sane. Thus, interrupts are enabled by default during system call processing unless explicitly disabled. Your analysis of what would happen otherwise is quite correct. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 11:28 -0400, linux-os (Dick Johnson) wrote: > On Thu, 11 Aug 2005, Coywolf Qi Hunt wrote: > > > On 8/11/05, Steven Rostedt <[EMAIL PROTECTED]> wrote: > >> On Thu, 2005-08-11 at 10:04 -0400, linux-os (Dick Johnson) wrote: > >>> Every interrupt software, or hardware, results in the branched > >>> procedure being executed with the interrupts OFF. That's why > >>> one of the first instructions in the kernel entry for a syscall > >>> is 'sti' to turn them back on. Look at entry.S, line 182. This > >>> occurs any time a trap occurs as well (Page 26-168, i486 > >>> Programmer's reference manual). FYI, this is helpful when > >>> designing/debugging complex interrupt-service routines since > >>> you can execute the interrupt with a software 'INT' instruction > >>> (with the correct offset from the IRQ you are using). The software > >>> doesn't 'know' where the interrupt came from, HW or SW. > >> > >> I'm looking at 2.6.13-rc6-git1 line 182 of entry.S and I don't see it. > >> Must be a different kernel. > >> > >> According to the documentation that I was looking at, a trap in x86 does > >> _not_ turn off interrupts. > >> > > ... > >> > >> I don't see a sti here. > > > > Search for sysenter_entry. This is where the stack is switched > to the kernel stack. Then the code falls through past the > next label, sysenter_past_esp. The very next instruction > after the kernel stack has been set is 'sti'. Clear as day. I just applied the following to one of my kernels: -- arch/i386/kernel/entry.S(revision 274) +++ arch/i386/kernel/entry.S(working copy) @@ -184,6 +184,7 @@ ENTRY(sysenter_entry) movl TSS_sysenter_esp0(%esp),%esp sysenter_past_esp: + ud2 sti pushl $(__USER_DS) pushl %ebp And booted it. The system is up and running, so I really don't think that the sysenter_entry is used for system calls. Not so "Clear as day"! -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 11 Aug 2005, Coywolf Qi Hunt wrote: > On 8/11/05, Steven Rostedt <[EMAIL PROTECTED]> wrote: >> On Thu, 2005-08-11 at 10:04 -0400, linux-os (Dick Johnson) wrote: >>> Every interrupt software, or hardware, results in the branched >>> procedure being executed with the interrupts OFF. That's why >>> one of the first instructions in the kernel entry for a syscall >>> is 'sti' to turn them back on. Look at entry.S, line 182. This >>> occurs any time a trap occurs as well (Page 26-168, i486 >>> Programmer's reference manual). FYI, this is helpful when >>> designing/debugging complex interrupt-service routines since >>> you can execute the interrupt with a software 'INT' instruction >>> (with the correct offset from the IRQ you are using). The software >>> doesn't 'know' where the interrupt came from, HW or SW. >> >> I'm looking at 2.6.13-rc6-git1 line 182 of entry.S and I don't see it. >> Must be a different kernel. >> >> According to the documentation that I was looking at, a trap in x86 does >> _not_ turn off interrupts. >> > ... >> >> I don't see a sti here. > Search for sysenter_entry. This is where the stack is switched to the kernel stack. Then the code falls through past the next label, sysenter_past_esp. The very next instruction after the kernel stack has been set is 'sti'. Clear as day. > >> -- Steve > > > He is RBJ, Richard B. Johnson, the LKML defacto official troll. > > -- > Coywolf Qi Hunt > http://ahbl.org/~coywolf/ > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > Cheers, Dick Johnson Penguin : Linux version 2.6.12 on an i686 machine (5537.79 BogoMips). Warning : 98.36% of all statistics are fiction. . I apologize for the following. I tried to kill it with the above dot : The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [EMAIL PROTECTED] - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 23:13 +0800, Coywolf Qi Hunt wrote: > > He is RBJ, Richard B. Johnson, the LKML defacto official troll. > Oh, so this is "root" who almost got DaveJ fired? :) -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On 8/11/05, Steven Rostedt <[EMAIL PROTECTED]> wrote: > On Thu, 2005-08-11 at 10:04 -0400, linux-os (Dick Johnson) wrote: > > Every interrupt software, or hardware, results in the branched > > procedure being executed with the interrupts OFF. That's why > > one of the first instructions in the kernel entry for a syscall > > is 'sti' to turn them back on. Look at entry.S, line 182. This > > occurs any time a trap occurs as well (Page 26-168, i486 > > Programmer's reference manual). FYI, this is helpful when > > designing/debugging complex interrupt-service routines since > > you can execute the interrupt with a software 'INT' instruction > > (with the correct offset from the IRQ you are using). The software > > doesn't 'know' where the interrupt came from, HW or SW. > > I'm looking at 2.6.13-rc6-git1 line 182 of entry.S and I don't see it. > Must be a different kernel. > > According to the documentation that I was looking at, a trap in x86 does > _not_ turn off interrupts. > ... > > I don't see a sti here. > > -- Steve He is RBJ, Richard B. Johnson, the LKML defacto official troll. -- Coywolf Qi Hunt http://ahbl.org/~coywolf/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 10:04 -0400, linux-os (Dick Johnson) wrote: > Every interrupt software, or hardware, results in the branched > procedure being executed with the interrupts OFF. That's why > one of the first instructions in the kernel entry for a syscall > is 'sti' to turn them back on. Look at entry.S, line 182. This > occurs any time a trap occurs as well (Page 26-168, i486 > Programmer's reference manual). FYI, this is helpful when > designing/debugging complex interrupt-service routines since > you can execute the interrupt with a software 'INT' instruction > (with the correct offset from the IRQ you are using). The software > doesn't 'know' where the interrupt came from, HW or SW. I'm looking at 2.6.13-rc6-git1 line 182 of entry.S and I don't see it. Must be a different kernel. According to the documentation that I was looking at, a trap in x86 does _not_ turn off interrupts. In arch/i386/kernel/traps.c: trap_init set_system_gate(SYSCALL_VECTOR,_call); (where SYSCALL_VECTOR is of course 0x80). This sets up a trap: static void __init set_system_gate(unsigned int n, void *addr) { _set_gate(idt_table+n,15,3,addr,__KERNEL_CS); } since type 15 makes this a trap (3 gives it user access). Also looking at the code that it will call: ENTRY(system_call) pushl %eax # save orig_eax SAVE_ALL GET_THREAD_INFO(%ebp) # system call tracing in operation /* Note, _TIF_SECCOMP is bit number 8, and so it needs testw and not testb */ testw $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SECCOMP),TI_flags(%ebp) jnz syscall_trace_entry cmpl $(nr_syscalls), %eax jae syscall_badsys syscall_call: call *sys_call_table(,%eax,4) movl %eax,EAX(%esp) # store the return value syscall_exit: cli # make sure we don't miss an interrupt # setting need_resched or sigpending # between sampling and the iret I don't see a sti here. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 11 Aug 2005, Bodo Eggert wrote: > Ukil a <[EMAIL PROTECTED]> wrote: > >> Now I had the doubt that if the the syscall >> implementation is very large will the scheduling and >> other interrupts be blocked for the whole time till >> the process returns from the ISR (because in an ISR by >> default the interrupts are disabled unless "sti" is >> called explicitly)? > > According to my documentation it isn't. A software interrupt is a far call > with an extra pushf, and a hardware interrupt is protected against recursion > by the PIC, not by an interrupt flag. > -- Every interrupt software, or hardware, results in the branched procedure being executed with the interrupts OFF. That's why one of the first instructions in the kernel entry for a syscall is 'sti' to turn them back on. Look at entry.S, line 182. This occurs any time a trap occurs as well (Page 26-168, i486 Programmer's reference manual). FYI, this is helpful when designing/debugging complex interrupt-service routines since you can execute the interrupt with a software 'INT' instruction (with the correct offset from the IRQ you are using). The software doesn't 'know' where the interrupt came from, HW or SW. Cheers, Dick Johnson Penguin : Linux version 2.6.12 on an i686 machine (5537.79 BogoMips). Warning : 98.36% of all statistics are fiction. . I apologize for the following. I tried to kill it with the above dot : The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [EMAIL PROTECTED] - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 15:41 +0200, Bodo Eggert wrote: > According to my documentation it isn't. A software interrupt is a far call > with an extra pushf, and a hardware interrupt is protected against recursion > by the PIC, not by an interrupt flag. I disagree with your definition of a system call. The "int 0x80" changes from user mode to kernel mode so it is much more powerful than a "far call". Also the CPU does protect against recursion and more than one interrupt coming in at the same time. The PIC also works with the CPU in this regard, but as I shown in my previous email, the interrupt flag _does_ protect against it. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Wed, 2005-08-10 at 22:39 -0700, Ukil a wrote: > I had this question. As per my understanding, in the > Linux system call implementation on x86 architecture > the call flows like this int 0x80 -> syscall -> > sys_call_vector(taken from the table)-> return from > interrupt service routine. > > Now I had the doubt that if the the syscall > implementation is very large will the scheduling and > other interrupts be blocked for the whole time till > the process returns from the ISR (because in an ISR by > default the interrupts are disabled unless “sti” is > called explicitly)? That’s appears to be too long for > the scheduling or other interrupts to be blocked? > Am I missing something here? This is where interrupt is not a good term for syscall. It is really a trap. An interrupt is an outside source that stops the CPU from doing what it was doing to go do something else (asynchronous event). A trap is something that the CPU calls on itself to do something else (synchronous event). So when a network packet comes in, the NIC sends an interrupt request (request since the CPU may not immediately handle it), and when the CPU is ready (interrupts on) it will stop what it is doing and handle the network packet. When you do a system call, it is a trap. Quoting the Intel manual: The difference between an interrupt gate and a trap gate is as follows. If an interrupt or exception handler is called through an interrupt gate, the processor clears the interrupt enable (IF) flag in the EFLAGS register to prevent subsequent interrupts from interfering with the execution of the handler. When a handler is called through a trap gate, the state of the IF flag is not changed. So when you go into the kernel through a trap (system call) interrupts are still on if they were on to begin with, which is the case when coming from user mode. But when you come into the kernel through a real interrupt, then interrupts are off. Also note that if you have preemption disabled, the system call will _not_ do any scheduling unless it explicitly calls schedule. If you turn on voluntary preempt, it will call schedule at various spots in the kernel that might call schedule anyway. If you turn on full preemption, then the process can be scheduled out anywhere in the kernel unless it explicitly says it doesn't want to (preempt_disable or spin_lock). -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
Ukil a <[EMAIL PROTECTED]> wrote: > Now I had the doubt that if the the syscall > implementation is very large will the scheduling and > other interrupts be blocked for the whole time till > the process returns from the ISR (because in an ISR by > default the interrupts are disabled unless sti is > called explicitly)? According to my documentation it isn't. A software interrupt is a far call with an extra pushf, and a hardware interrupt is protected against recursion by the PIC, not by an interrupt flag. -- Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF verbreiteten Lügen zu sabotieren. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
Ukil a [EMAIL PROTECTED] wrote: Now I had the doubt that if the the syscall implementation is very large will the scheduling and other interrupts be blocked for the whole time till the process returns from the ISR (because in an ISR by default the interrupts are disabled unless sti is called explicitly)? According to my documentation it isn't. A software interrupt is a far call with an extra pushf, and a hardware interrupt is protected against recursion by the PIC, not by an interrupt flag. -- Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF verbreiteten Lügen zu sabotieren. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Wed, 2005-08-10 at 22:39 -0700, Ukil a wrote: I had this question. As per my understanding, in the Linux system call implementation on x86 architecture the call flows like this int 0x80 - syscall - sys_call_vector(taken from the table)- return from interrupt service routine. Now I had the doubt that if the the syscall implementation is very large will the scheduling and other interrupts be blocked for the whole time till the process returns from the ISR (because in an ISR by default the interrupts are disabled unless “sti” is called explicitly)? That’s appears to be too long for the scheduling or other interrupts to be blocked? Am I missing something here? This is where interrupt is not a good term for syscall. It is really a trap. An interrupt is an outside source that stops the CPU from doing what it was doing to go do something else (asynchronous event). A trap is something that the CPU calls on itself to do something else (synchronous event). So when a network packet comes in, the NIC sends an interrupt request (request since the CPU may not immediately handle it), and when the CPU is ready (interrupts on) it will stop what it is doing and handle the network packet. When you do a system call, it is a trap. Quoting the Intel manual: The difference between an interrupt gate and a trap gate is as follows. If an interrupt or exception handler is called through an interrupt gate, the processor clears the interrupt enable (IF) flag in the EFLAGS register to prevent subsequent interrupts from interfering with the execution of the handler. When a handler is called through a trap gate, the state of the IF flag is not changed. So when you go into the kernel through a trap (system call) interrupts are still on if they were on to begin with, which is the case when coming from user mode. But when you come into the kernel through a real interrupt, then interrupts are off. Also note that if you have preemption disabled, the system call will _not_ do any scheduling unless it explicitly calls schedule. If you turn on voluntary preempt, it will call schedule at various spots in the kernel that might call schedule anyway. If you turn on full preemption, then the process can be scheduled out anywhere in the kernel unless it explicitly says it doesn't want to (preempt_disable or spin_lock). -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 11 Aug 2005, Bodo Eggert wrote: Ukil a [EMAIL PROTECTED] wrote: Now I had the doubt that if the the syscall implementation is very large will the scheduling and other interrupts be blocked for the whole time till the process returns from the ISR (because in an ISR by default the interrupts are disabled unless sti is called explicitly)? According to my documentation it isn't. A software interrupt is a far call with an extra pushf, and a hardware interrupt is protected against recursion by the PIC, not by an interrupt flag. -- Every interrupt software, or hardware, results in the branched procedure being executed with the interrupts OFF. That's why one of the first instructions in the kernel entry for a syscall is 'sti' to turn them back on. Look at entry.S, line 182. This occurs any time a trap occurs as well (Page 26-168, i486 Programmer's reference manual). FYI, this is helpful when designing/debugging complex interrupt-service routines since you can execute the interrupt with a software 'INT' instruction (with the correct offset from the IRQ you are using). The software doesn't 'know' where the interrupt came from, HW or SW. Cheers, Dick Johnson Penguin : Linux version 2.6.12 on an i686 machine (5537.79 BogoMips). Warning : 98.36% of all statistics are fiction. . I apologize for the following. I tried to kill it with the above dot : The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [EMAIL PROTECTED] - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 15:41 +0200, Bodo Eggert wrote: According to my documentation it isn't. A software interrupt is a far call with an extra pushf, and a hardware interrupt is protected against recursion by the PIC, not by an interrupt flag. I disagree with your definition of a system call. The int 0x80 changes from user mode to kernel mode so it is much more powerful than a far call. Also the CPU does protect against recursion and more than one interrupt coming in at the same time. The PIC also works with the CPU in this regard, but as I shown in my previous email, the interrupt flag _does_ protect against it. -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 10:04 -0400, linux-os (Dick Johnson) wrote: Every interrupt software, or hardware, results in the branched procedure being executed with the interrupts OFF. That's why one of the first instructions in the kernel entry for a syscall is 'sti' to turn them back on. Look at entry.S, line 182. This occurs any time a trap occurs as well (Page 26-168, i486 Programmer's reference manual). FYI, this is helpful when designing/debugging complex interrupt-service routines since you can execute the interrupt with a software 'INT' instruction (with the correct offset from the IRQ you are using). The software doesn't 'know' where the interrupt came from, HW or SW. I'm looking at 2.6.13-rc6-git1 line 182 of entry.S and I don't see it. Must be a different kernel. According to the documentation that I was looking at, a trap in x86 does _not_ turn off interrupts. In arch/i386/kernel/traps.c: trap_init set_system_gate(SYSCALL_VECTOR,system_call); (where SYSCALL_VECTOR is of course 0x80). This sets up a trap: static void __init set_system_gate(unsigned int n, void *addr) { _set_gate(idt_table+n,15,3,addr,__KERNEL_CS); } since type 15 makes this a trap (3 gives it user access). Also looking at the code that it will call: ENTRY(system_call) pushl %eax # save orig_eax SAVE_ALL GET_THREAD_INFO(%ebp) # system call tracing in operation /* Note, _TIF_SECCOMP is bit number 8, and so it needs testw and not testb */ testw $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SECCOMP),TI_flags(%ebp) jnz syscall_trace_entry cmpl $(nr_syscalls), %eax jae syscall_badsys syscall_call: call *sys_call_table(,%eax,4) movl %eax,EAX(%esp) # store the return value syscall_exit: cli # make sure we don't miss an interrupt # setting need_resched or sigpending # between sampling and the iret I don't see a sti here. -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On 8/11/05, Steven Rostedt [EMAIL PROTECTED] wrote: On Thu, 2005-08-11 at 10:04 -0400, linux-os (Dick Johnson) wrote: Every interrupt software, or hardware, results in the branched procedure being executed with the interrupts OFF. That's why one of the first instructions in the kernel entry for a syscall is 'sti' to turn them back on. Look at entry.S, line 182. This occurs any time a trap occurs as well (Page 26-168, i486 Programmer's reference manual). FYI, this is helpful when designing/debugging complex interrupt-service routines since you can execute the interrupt with a software 'INT' instruction (with the correct offset from the IRQ you are using). The software doesn't 'know' where the interrupt came from, HW or SW. I'm looking at 2.6.13-rc6-git1 line 182 of entry.S and I don't see it. Must be a different kernel. According to the documentation that I was looking at, a trap in x86 does _not_ turn off interrupts. ... I don't see a sti here. -- Steve He is RBJ, Richard B. Johnson, the LKML defacto official troll. -- Coywolf Qi Hunt http://ahbl.org/~coywolf/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 23:13 +0800, Coywolf Qi Hunt wrote: He is RBJ, Richard B. Johnson, the LKML defacto official troll. Oh, so this is root who almost got DaveJ fired? :) -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 11 Aug 2005, Coywolf Qi Hunt wrote: On 8/11/05, Steven Rostedt [EMAIL PROTECTED] wrote: On Thu, 2005-08-11 at 10:04 -0400, linux-os (Dick Johnson) wrote: Every interrupt software, or hardware, results in the branched procedure being executed with the interrupts OFF. That's why one of the first instructions in the kernel entry for a syscall is 'sti' to turn them back on. Look at entry.S, line 182. This occurs any time a trap occurs as well (Page 26-168, i486 Programmer's reference manual). FYI, this is helpful when designing/debugging complex interrupt-service routines since you can execute the interrupt with a software 'INT' instruction (with the correct offset from the IRQ you are using). The software doesn't 'know' where the interrupt came from, HW or SW. I'm looking at 2.6.13-rc6-git1 line 182 of entry.S and I don't see it. Must be a different kernel. According to the documentation that I was looking at, a trap in x86 does _not_ turn off interrupts. ... I don't see a sti here. Search for sysenter_entry. This is where the stack is switched to the kernel stack. Then the code falls through past the next label, sysenter_past_esp. The very next instruction after the kernel stack has been set is 'sti'. Clear as day. -- Steve He is RBJ, Richard B. Johnson, the LKML defacto official troll. -- Coywolf Qi Hunt http://ahbl.org/~coywolf/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ Cheers, Dick Johnson Penguin : Linux version 2.6.12 on an i686 machine (5537.79 BogoMips). Warning : 98.36% of all statistics are fiction. . I apologize for the following. I tried to kill it with the above dot : The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [EMAIL PROTECTED] - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
Ukil a wrote: I had this question. As per my understanding, in the Linux system call implementation on x86 architecture the call flows like this int 0x80 - syscall - sys_call_vector(taken from the table)- return from interrupt service routine. Almost. There are two entry points, the one you describe above, and the sysenter entry point. Now I had the doubt that if the the syscall implementation is very large will the scheduling and other interrupts be blocked for the whole time till the process returns from the ISR (because in an ISR by default the interrupts are disabled unless “sti” is called explicitly)? That’s appears to be too long for the scheduling or other interrupts to be blocked? Am I missing something here? There are 3 types of gates you can use to service interrupts / faults on i386. Task gates are used where complex state changes are required, and an assured state is needed, such as doublefault and NMI handlers. Interrupt gates are used where interrupts must be disabled during initial processing, such as the page fault gate. Trap gates are used when interrupts may be allowed, and do not clear the interrupt flag. On Linux, syscall vector int 0x80 is a trap gate, which means interrupts are not disabled. The sysenter handler is very special; SYSENTER does disable interrupts, so if you look at sysenter_entry, one of the first things it will do is re-enable interrupts as soon as the stack is sane. Thus, interrupts are enabled by default during system call processing unless explicitly disabled. Your analysis of what would happen otherwise is quite correct. Zach - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 11:28 -0400, linux-os (Dick Johnson) wrote: On Thu, 11 Aug 2005, Coywolf Qi Hunt wrote: On 8/11/05, Steven Rostedt [EMAIL PROTECTED] wrote: On Thu, 2005-08-11 at 10:04 -0400, linux-os (Dick Johnson) wrote: Every interrupt software, or hardware, results in the branched procedure being executed with the interrupts OFF. That's why one of the first instructions in the kernel entry for a syscall is 'sti' to turn them back on. Look at entry.S, line 182. This occurs any time a trap occurs as well (Page 26-168, i486 Programmer's reference manual). FYI, this is helpful when designing/debugging complex interrupt-service routines since you can execute the interrupt with a software 'INT' instruction (with the correct offset from the IRQ you are using). The software doesn't 'know' where the interrupt came from, HW or SW. I'm looking at 2.6.13-rc6-git1 line 182 of entry.S and I don't see it. Must be a different kernel. According to the documentation that I was looking at, a trap in x86 does _not_ turn off interrupts. ... I don't see a sti here. Search for sysenter_entry. This is where the stack is switched to the kernel stack. Then the code falls through past the next label, sysenter_past_esp. The very next instruction after the kernel stack has been set is 'sti'. Clear as day. I just applied the following to one of my kernels: -- arch/i386/kernel/entry.S(revision 274) +++ arch/i386/kernel/entry.S(working copy) @@ -184,6 +184,7 @@ ENTRY(sysenter_entry) movl TSS_sysenter_esp0(%esp),%esp sysenter_past_esp: + ud2 sti pushl $(__USER_DS) pushl %ebp And booted it. The system is up and running, so I really don't think that the sysenter_entry is used for system calls. Not so Clear as day! -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 11:51 -0400, Steven Rostedt wrote: And booted it. The system is up and running, so I really don't think that the sysenter_entry is used for system calls. Not so Clear as day! And so, looking into sysenter_entry, it seems that my configurations don't seem to use it. This jumps straight to system_call without ever having to turn interrupts on. # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 8 model name : Pentium III (Coppermine) stepping: 3 cpu MHz : 367.939 cache size : 256 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips: 722.94 -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On 8/12/05, Steven Rostedt [EMAIL PROTECTED] wrote: On Thu, 2005-08-11 at 11:51 -0400, Steven Rostedt wrote: And booted it. The system is up and running, so I really don't think that the sysenter_entry is used for system calls. Not so Clear as day! And so, looking into sysenter_entry, it seems that my configurations don't seem to use it. This jumps straight to system_call without ever having to turn interrupts on. # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 8 model name : Pentium III (Coppermine) stepping: 3 cpu MHz : 367.939 cache size : 256 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips: 722.94 -- Steve The cpu does have sep. Is it vanilla kernel? -- Coywolf Qi Hunt http://ahbl.org/~coywolf/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On 8/12/05, Coywolf Qi Hunt [EMAIL PROTECTED] wrote: On 8/12/05, Steven Rostedt [EMAIL PROTECTED] wrote: On Thu, 2005-08-11 at 11:51 -0400, Steven Rostedt wrote: And booted it. The system is up and running, so I really don't think that the sysenter_entry is used for system calls. Not so Clear as day! And so, looking into sysenter_entry, it seems that my configurations don't seem to use it. This jumps straight to system_call without ever having to turn interrupts on. # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 8 model name : Pentium III (Coppermine) stepping: 3 cpu MHz : 367.939 cache size : 256 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips: 722.94 -- Steve The cpu does have sep. Is it vanilla kernel? Also glibc support. -- Coywolf Qi Hunt http://ahbl.org/~coywolf/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 11 Aug 2005, Coywolf Qi Hunt wrote: On 8/12/05, Coywolf Qi Hunt [EMAIL PROTECTED] wrote: On 8/12/05, Steven Rostedt [EMAIL PROTECTED] wrote: On Thu, 2005-08-11 at 11:51 -0400, Steven Rostedt wrote: And booted it. The system is up and running, so I really don't think that the sysenter_entry is used for system calls. Not so Clear as day! And so, looking into sysenter_entry, it seems that my configurations don't seem to use it. This jumps straight to system_call without ever having to turn interrupts on. # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 8 model name : Pentium III (Coppermine) stepping: 3 cpu MHz : 367.939 cache size : 256 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips: 722.94 -- Steve The cpu does have sep. Is it vanilla kernel? Also glibc support. -- Coywolf Qi Hunt http://ahbl.org/~coywolf/ Probably doesn't use int 0x80 at all. Cheers, Dick Johnson Penguin : Linux version 2.6.12 on an i686 machine (5537.79 BogoMips). Warning : 98.36% of all statistics are fiction. . I apologize for the following. I tried to kill it with the above dot : The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [EMAIL PROTECTED] - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Fri, 2005-08-12 at 00:59 +0800, Coywolf Qi Hunt wrote: On 8/12/05, Coywolf Qi Hunt [EMAIL PROTECTED] wrote: On 8/12/05, Steven Rostedt [EMAIL PROTECTED] wrote: flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge The cpu does have sep. Is it vanilla kernel? It's vanilla 2.6.12-rc3 + Ingo's RT V0.7.46-02-rs-0.4 + some of my own customizations. But I never touched the sysentry stuff and with a few printks I see it is being initialized. Also glibc support. I'm using Debian unstable with a recent (last week) update. -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 13:10 -0400, linux-os (Dick Johnson) wrote: On Thu, 11 Aug 2005, Coywolf Qi Hunt wrote: Also glibc support. -- Coywolf Qi Hunt http://ahbl.org/~coywolf/ Probably doesn't use int 0x80 at all. $ objdump -Dhalpr /lib/libc.so.6 | egrep 'int *\$0x80' | wc 4482240 20160 And a little snapshot: 000288d0 __libc_sigsuspend: 288d0: 55 push %ebp 288d1: 89 e5 mov%esp,%ebp 288d3: 57 push %edi 288d4: 56 push %esi 288d5: 53 push %ebx 288d6: e8 00 00 00 00 call 288db __libc_sigsuspend+0xb 288db: 5b pop%ebx 288dc: 81 c3 19 c7 0e 00 add$0xec719,%ebx 288e2: 8b 83 b4 32 00 00 mov0x32b4(%ebx),%eax 288e8: 85 c0 test %eax,%eax 288ea: 75 23 jne2890f __libc_sigsuspend+0x3f 288ec: b9 08 00 00 00 mov$0x8,%ecx 288f1: 8b 55 08mov0x8(%ebp),%edx 288f4: 87 d3 xchg %edx,%ebx 288f6: b8 b3 00 00 00 mov$0xb3,%eax 288fb: cd 80 int$0x80 288fd: 87 d3 xchg %edx,%ebx 288ff: 89 c6 mov%eax,%esi 28901: 3d 00 f0 ff ff cmp$0xf000,%eax 28906: 77 33 ja 2893b __libc_sigsuspend+0x6b 28908: 89 f0 mov%esi,%eax 2890a: 5b pop%ebx 2890b: 5e pop%esi 2890c: 5f pop%edi 2890d: 5d pop%ebp 2890e: c3 ret 288fb seems to use int 0x80 and so do all the other system calls that I inspected. $ ls -l /lib/libc.so.6 lrwxrwxrwx 1 root root 13 2005-08-09 22:28 /lib/libc.so.6 - libc-2.3.5.so -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 13:26 -0400, Steven Rostedt wrote: 288fb seems to use int 0x80 and so do all the other system calls that I inspected. I expect that if I had a Gentoo system that I compiled for my machine, this would be different. But I suspect that Debian still wants to run on my old Pentium 75MHz laptop. How would libc know to use sysenter instead of int 0x80. It could do a test of the system, but would there be an if statement for every system call then? I guess that libc needs to be compiled either to use it or not. Since there are still several machines out there that don't have this feature, it would be safer to not use it. -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 11 Aug 2005, Steven Rostedt wrote: On Thu, 2005-08-11 at 13:10 -0400, linux-os (Dick Johnson) wrote: On Thu, 11 Aug 2005, Coywolf Qi Hunt wrote: Also glibc support. -- Coywolf Qi Hunt http://ahbl.org/~coywolf/ Probably doesn't use int 0x80 at all. $ objdump -Dhalpr /lib/libc.so.6 | egrep 'int *\$0x80' | wc 4482240 20160 And a little snapshot: 000288d0 __libc_sigsuspend: 288d0: 55 push %ebp 288d1: 89 e5 mov%esp,%ebp 288d3: 57 push %edi 288d4: 56 push %esi 288d5: 53 push %ebx 288d6: e8 00 00 00 00 call 288db __libc_sigsuspend+0xb 288db: 5b pop%ebx 288dc: 81 c3 19 c7 0e 00 add$0xec719,%ebx 288e2: 8b 83 b4 32 00 00 mov0x32b4(%ebx),%eax 288e8: 85 c0 test %eax,%eax 288ea: 75 23 jne2890f __libc_sigsuspend+0x3f 288ec: b9 08 00 00 00 mov$0x8,%ecx 288f1: 8b 55 08mov0x8(%ebp),%edx 288f4: 87 d3 xchg %edx,%ebx 288f6: b8 b3 00 00 00 mov$0xb3,%eax 288fb: cd 80 int$0x80 288fd: 87 d3 xchg %edx,%ebx 288ff: 89 c6 mov%eax,%esi 28901: 3d 00 f0 ff ff cmp$0xf000,%eax 28906: 77 33 ja 2893b __libc_sigsuspend+0x6b 28908: 89 f0 mov%esi,%eax 2890a: 5b pop%ebx 2890b: 5e pop%esi 2890c: 5f pop%edi 2890d: 5d pop%ebp 2890e: c3 ret 288fb seems to use int 0x80 and so do all the other system calls that I inspected. $ ls -l /lib/libc.so.6 lrwxrwxrwx 1 root root 13 2005-08-09 22:28 /lib/libc.so.6 - libc-2.3.5.so -- Steve I was talking about the one who had the glibc support to use the newer system-call entry (who's name can confuse). You are looking at code that uses int 0x80. It's an interrupt, therefore, in the kernel, once the stack is set up, interrupts need to be (re)enabled. Cheers, Dick Johnson Penguin : Linux version 2.6.12 on an i686 machine (5537.79 BogoMips). Warning : 98.36% of all statistics are fiction. . I apologize for the following. I tried to kill it with the above dot : The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [EMAIL PROTECTED] - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 11 Aug 2005, Steven Rostedt wrote: On Thu, 2005-08-11 at 13:26 -0400, Steven Rostedt wrote: 288fb seems to use int 0x80 and so do all the other system calls that I inspected. I expect that if I had a Gentoo system that I compiled for my machine, this would be different. But I suspect that Debian still wants to run on my old Pentium 75MHz laptop. How would libc know to use sysenter instead of int 0x80. It could do a test of the system, but would there be an if statement for every system call then? I guess that libc needs to be compiled either to use it or not. Since there are still several machines out there that don't have this feature, it would be safer to not use it. -- Steve Well I have a small-C runtime library that I put together for imbedded systems. Once somebody heard that I was using the obsolete int 0x80, they insisted that I re-do everything to use the new interface. Since I wasn't getting paid to think on that project, I did what I was told. Bench-marks to 'getpid()' showed the 0x80 interrupt faster by a few cycles so the suits claimed that I must have done something wrong. So we had a code-review. Finally it was decided; The CPU must be handling things differently... i.e., go back to the simpler int 0x80 interface. It was obvious to me that any difference in speed was simply noise. Both ways are essentially the same for performance so I wouldn't lose any sleep over an older 'C' runtime library. Cheers, Dick Johnson Penguin : Linux version 2.6.12 on an i686 machine (5537.79 BogoMips). Warning : 98.36% of all statistics are fiction. . I apologize for the following. I tried to kill it with the above dot : The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [EMAIL PROTECTED] - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
Steven Rostedt wrote: I expect that if I had a Gentoo system that I compiled for my machine, this would be different. But I suspect that Debian still wants to run on my old Pentium 75MHz laptop. How would libc know to use sysenter instead of int 0x80. It could do a test of the system, but would there be an if statement for every system call then? I guess that libc needs to be compiled either to use it or not. Since there are still several machines out there that don't have this feature, it would be safer to not use it. zach-dev2:~ $ ldd /bin/ls linux-gate.so.1 = (0xe000) This is the vsyscall entry point, which gets linked by ld into all processes. It is a kernel page which is visible to user space, and is rewritten to support sysenter if indeed that instruction is available. Glibc has fixed entry points to this page. Here is a view of the system call entry point on a machine which supports sysenter: (gdb) break _init Breakpoint 1 at 0x8049522 (gdb) run Starting program: /bin/ls (no debugging symbols found)...[Thread debugging using libthread_db enabled] [New Thread 1075283616 (LWP 5328)] [Switching to Thread 1075283616 (LWP 5328)] Breakpoint 1, 0x08049522 in _init () (gdb) x/10i 0xe400 0xe400: push %ecx 0xe401: push %edx 0xe402: push %ebp 0xe403: mov%esp,%ebp 0xe405: sysenter 0xe407: nop 0xe408: nop 0xe409: nop 0xe40a: nop 0xe40b: nop On a machine that does not support sysenter, this will give you: int $0x80 ret The int $0x80 system calls are still fully supported by a sysenter capable kernel, since it must run older binaries and potentially support syscalls during early boot up before it is known that sysenter is supported. Zach - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 13:46 -0400, linux-os (Dick Johnson) wrote: I was talking about the one who had the glibc support to use the newer system-call entry (who's name can confuse). You are looking at code that uses int 0x80. It's an interrupt, therefore, in the kernel, once the stack is set up, interrupts need to be (re)enabled. int is a call to either an interrupt or exception procedure. 0x80 is setup in Linux to be a trap and not an interrupt vector. So it does _not_ turn off interrupts. I'm looking at the sysenter code which is suppose to be the fast entry into the system, and it looks like it is suppose to call the sysenter_entry when used. I'm trying to write something to test this out, since I still have the ud2 op in my sysentry code. So if I do get this to work, I can cause a bug. -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 10:59 -0700, Zachary Amsden wrote: zach-dev2:~ $ ldd /bin/ls linux-gate.so.1 = (0xe000) OHHH! So THAT is what linux-gate is used for! Thanks, I've been really confused by that. This is the vsyscall entry point, which gets linked by ld into all processes. It is a kernel page which is visible to user space, and is rewritten to support sysenter if indeed that instruction is available. Glibc has fixed entry points to this page. Here is a view of the system call entry point on a machine which supports sysenter: (gdb) break _init Breakpoint 1 at 0x8049522 (gdb) run Starting program: /bin/ls (no debugging symbols found)...[Thread debugging using libthread_db enabled] [New Thread 1075283616 (LWP 5328)] [Switching to Thread 1075283616 (LWP 5328)] Breakpoint 1, 0x08049522 in _init () (gdb) x/10i 0xe400 0xe400: push %ecx 0xe401: push %edx 0xe402: push %ebp 0xe403: mov%esp,%ebp 0xe405: sysenter 0xe407: nop 0xe408: nop 0xe409: nop 0xe40a: nop 0xe40b: nop OK, I get the same on my machine. On a machine that does not support sysenter, this will give you: int $0x80 ret The int $0x80 system calls are still fully supported by a sysenter capable kernel, since it must run older binaries and potentially support syscalls during early boot up before it is known that sysenter is supported. Now is the latest glibc using this. Since I put in a ud2 op in my sysenter_entry code, which is not triggered, as well as an objdump of libc.so shows a bunch of int 0x80 calls. -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 11 Aug 2005, Steven Rostedt wrote: On Thu, 2005-08-11 at 13:46 -0400, linux-os (Dick Johnson) wrote: I was talking about the one who had the glibc support to use the newer system-call entry (who's name can confuse). You are looking at code that uses int 0x80. It's an interrupt, therefore, in the kernel, once the stack is set up, interrupts need to be (re)enabled. int is a call to either an interrupt or exception procedure. 0x80 is setup in Linux to be a trap and not an interrupt vector. So it does _not_ turn off interrupts. I'm not sure you can stop the CPU from clearing the interrupt bit in EFLAGS if you execute an interrupt. The interrupt handler may be supported by a trap-gate, but the event has already occurred. The documentation I have isn't clear on this at all. I'm looking at the sysenter code which is suppose to be the fast entry into the system, and it looks like it is suppose to call the sysenter_entry when used. I'm trying to write something to test this out, since I still have the ud2 op in my sysentry code. So if I do get this to work, I can cause a bug. -- Steve Cheers, Dick Johnson Penguin : Linux version 2.6.12 on an i686 machine (5537.79 BogoMips). Warning : 98.36% of all statistics are fiction. . I apologize for the following. I tried to kill it with the above dot : The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [EMAIL PROTECTED] - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 14:21 -0400, linux-os (Dick Johnson) wrote: I'm not sure you can stop the CPU from clearing the interrupt bit in EFLAGS if you execute an interrupt. The interrupt handler may be supported by a trap-gate, but the event has already occurred. The documentation I have isn't clear on this at all. From the Intel document 25366513 IA32 Intel Architecture's Software Developer's Manual Volume 1, page 145 (or 6-11) Section Call and Return Operation for Interrupt or Exception Handling Procedures. A call to an interrupt or exception handler procedure is similar to a procedure call to another protection level (see Section 6.3.6., CALL and RET Operation Between Privilege Levels). Here, the interrupt vector references one of two kinds of gates: an interrupt gate or a trap gate. Interrupt and trap gates are similar to call gates in that they provide the following information: · access rights information · the segment selector for the code segment that contains the handler procedure · an offset into the code segment to the first instruction of the handler procedure The difference between an interrupt gate and a trap gate is as follows. If an interrupt or exception handler is called through an interrupt gate, the processor clears the interrupt enable (IF) flag in the EFLAGS register to prevent subsequent interrupts from interfering with the execution of the handler. When a handler is called through a trap gate, the state of the IF flag is not changed. And in linux, the system call vector is handled with a trap gate, and thus that is why the system_call in entry.S does not call sti. Although, you are right, if I use sysenter, then it would call sysenter_entry where it would need to enable interrupts again. To prove my point. All the libc syscalls seem to use int 0x80, and looking at the entry.S, it calls system_call directly. Now to see what sysenter would do I did the following changes: Index: arch/i386/kernel/entry.S === --- arch/i386/kernel/entry.S(revision 274) +++ arch/i386/kernel/entry.S(working copy) @@ -196,6 +196,8 @@ * Careful about security. */ cmpl $__PAGE_OFFSET-3,%ebp + call sdr_func + jmp syscall_fault jae syscall_fault 1: movl (%ebp),%ebp .section __ex_table,a Index: arch/i386/kernel/traps.c === --- arch/i386/kernel/traps.c(revision 274) +++ arch/i386/kernel/traps.c(working copy) @@ -1092,6 +1092,10 @@ } while (0) +void sdr_func(void) +{ + printk(hello from sdr_func\n); +} So my sysenter_entry in entry.S would call my function sdr_func which is defined in traps.c as above. Then I ran the following program: int main() { unsigned long a = 0x14; asm(push %%ecx;\npush %%edx;\nmov %%esp,%%ebp;\nsysenter ::a(a):cx,dx,sp,bp); return 0; } And I did get my print in the console. So it seems that my system does not use sysenter (even though the linux-gate.so seems to set this up), but instead uses the int 0x80, which in Linux does _not_ disable interrupts. I hope this clears things up for everyone. :-) -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
Steven Rostedt wrote: OK, I get the same on my machine. On a machine that does not support sysenter, this will give you: int $0x80 ret The int $0x80 system calls are still fully supported by a sysenter capable kernel, since it must run older binaries and potentially support syscalls during early boot up before it is known that sysenter is supported. Now is the latest glibc using this. Since I put in a ud2 op in my sysenter_entry code, which is not triggered, as well as an objdump of libc.so shows a bunch of int 0x80 calls. The NPTL version of glibc (the TLS library) uses this. zach-dev2:~ $ ldd /bin/ls linux-gate.so.1 = (0xe000) librt.so.1 = /lib/tls/librt.so.1 (0x4002e000) libacl.so.1 = /lib/libacl.so.1 (0x40038000) libselinux.so.1 = /lib/libselinux.so.1 (0x4003e000) --libc.so.6 = /lib/tls/libc.so.6 (0x4004c000) libpthread.so.0 = /lib/tls/libpthread.so.0 (0x40162000) /lib/ld-linux.so.2 = /lib/ld-linux.so.2 (0x4000) libattr.so.1 = /lib/libattr.so.1 (0x40174000) You'll find getpid much faster with TLS libraries (it's cached, no longer a system call): With TLS: zach-dev2:Micro-bench $ time ./getpid real0m0.080s user0m0.080s sys 0m0.000s Without TLS: zach-dev:Micro-bench $ time ./getpid real 0m5.041s user 0m2.520s sys0m2.520s If you're feeling really masochistic, I've added a demonstration of how you can call sysenter from userspace without glibc. The code verifies that there is no way to exploit the kernel to achieve reading arbitrary memory through a non-flat data segment. It deliberately segfaults at the end. Let me point out this is a very wrong way to do things - you should always use the vsyscall page, and in fact, this code actually depends on the vsyscall page even if it is not apparent. I fake the same frame structure that the vsyscall page would have pushed to simulate a vsyscall entry, but the kernel will always return to the vsyscall page, which then returns back to us. Fun stuff. If you leave the kernel hack for ud2 in your kernel, I would expect it to blow up in amazing fashion when running the code below. zach-dev2:~ $ gcc sysenter.S sysenter.c -o sys sysenter.c: In function `main': sysenter.c:34: warning: passing arg 2 of `signal' from incompatible pointer type sysenter.c:49: warning: passing arg 3 of `sysenter_call_2' makes pointer from in teger without a cast sysenter.c:22: warning: return type of `main' is not `int' zach-dev2:~ $ ./sys interrupted %ebp = 0xbaadf00d phew Segmentation fault (core dumped) zach-dev2:~ $ gdb sys core GNU gdb 6.2.1 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i586-suse-linux...Using host libthread_db library /lib/tls/libthread_db.so.1. Core was generated by `./sys'. Program terminated with signal 11, Segmentation fault. warning: current_sos: Can't read pathname for load map: Input/output error Reading symbols from /lib/tls/libc.so.6...done. Loaded symbols for /lib/tls/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 #0 0xe410 in ?? () (gdb) print $eax $1 = -14 (gdb) #define EFAULT 14 /* Bad address */ int main(int argc, char *argv[]) { int j; for (j = 0; j 100; j++) { getpid(); getpid(); getpid(); getpid(); getpid(); getpid(); getpid(); getpid(); getpid(); getpid(); } } #include sys/syscall.h .text .global sysenter_call .global sysenter_call_2 /* void sysenter_call(pid_t pid, int signo, short ds, void *addr) */ sysenter_call: push %ebx push %edi push %ebp push %ds movl %esp, %edi movl 20(%esp), %ebx /* pid */ movl 24(%esp), %ecx /* signo */ movl 28(%esp), %ds/* exploit DS */ movl 32(%esp), %ebp movl %ebp, %esp push $sysenter_return push %ecx push %edx subl $16, %ebp push $0xbaadf00d movl $SYS_kill, %eax sysenter /* vsyscall page will ret to us here */ sysenter_return: mov %edi, %esp pop %ds pop %ebp pop %edi pop %ebx ret sysenter_call_2: push %ebx push %ebp movl 12(%esp), %ebx /* pid */ movl 16(%esp), %ecx /* signo */ movl 20(%esp), %ebp movl $SYS_kill, %eax sysenter .data test: .long 0 #include stdio.h #include signal.h #include asm/ldt.h #include asm/segment.h #include sys/types.h #include unistd.h #include sys/mman.h #define __KERNEL__ #include asm/page.h extern void sysenter_call(pid_t pid, int signo, short ds, void *addr); extern void sysenter_call_2(pid_t pid, int
Re: Need help in understanding x86 syscall
On Thu, 2005-08-11 at 12:58 -0700, Zachary Amsden wrote: If you're feeling really masochistic, I've added a demonstration of how you can call sysenter from userspace without glibc. Thanks Zach, this will give me something to play around with when I have a little more spare time 8-} -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 11 Aug 2005, Steven Rostedt wrote: On Thu, 2005-08-11 at 13:46 -0400, linux-os (Dick Johnson) wrote: I was talking about the one who had the glibc support to use the newer system-call entry (who's name can confuse). You are looking at code that uses int 0x80. It's an interrupt, therefore, in the kernel, once the stack is set up, interrupts need to be (re)enabled. int is a call to either an interrupt or exception procedure. 0x80 is setup in Linux to be a trap and not an interrupt vector. So it does _not_ turn off interrupts. It's actually a vector, that's all you can install in the IDT. Also a trap doesn't advance the instruction pointer, so you resume at the trapping instruction (e.g. vector 14/page fault), 0x80 is an interrupt gate. One of the distinguishing differences is that 0x80 may be entered via int 0x80 from all ring levels. The reason why int 0x80 doesn't disable interrupts is because issuing int 0x80 directly is similar to doing a far call and therefore doesn't have the same effect as a real interrupt being issued. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On Thu, 11 Aug 2005, Steven Rostedt wrote: On Thu, 2005-08-11 at 15:41 +0200, Bodo Eggert wrote: According to my documentation it isn't. A software interrupt is a far call with an extra pushf, and a hardware interrupt is protected against recursion by the PIC, not by an interrupt flag. I disagree with your definition of a system call. The int 0x80 changes from user mode to kernel mode so it is much more powerful than a far call. Far calls and jumps can change to a inner ring. This is done by a special segment selector containing the segment _and_ the offset to jump to (the offset from the call instruction is ignored). Also the CPU does protect against recursion and more than one interrupt coming in at the same time. The PIC also works with the CPU in this regard, but as I shown in my previous email, the interrupt flag _does_ protect against it. Showing == claiming? However, my documentation was wrong. http://www.baldwin.cx/386htm/INT.htm -- Top 100 things you don't want the sysadmin to say: 99. Shit!! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On 08/11/2005 10:18 AM, Steven Rostedt wrote: It's vanilla 2.6.12-rc3 + Ingo's RT V0.7.46-02-rs-0.4 + some of my own customizations. But I never touched the sysentry stuff and with a few printks I see it is being initialized. Also glibc support. I'm using Debian unstable with a recent (last week) update. -- Steve But are you using libc6-i686? That enables NPTL. Perhaps the behavior difference is there? I'm surprised int 80 doesn't really cause an interrupt; it doesn't jump to the appropriate place in the x86 vector table? Interesting. Jeff [EMAIL PROTECTED]:~# dpkg -s libc6-i686 ... This set of libraries is optimized for i686 machines, and will only be used if you are running a 2.6 kernel on an i686 class CPU (check the output of `uname -m'). This includes Pentium Pro, Pentium II/III/IV, Celeron CPU's and similar class CPU's (including clones such as AMD Athlon/Opteron, VIA C3 Nehemiah, but not VIA C3 Ezla). . This package includes support for NPTL. . - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need help in understanding x86 syscall
On 8/12/05, Jeff Carr [EMAIL PROTECTED] wrote: On 08/11/2005 10:18 AM, Steven Rostedt wrote: It's vanilla 2.6.12-rc3 + Ingo's RT V0.7.46-02-rs-0.4 + some of my own customizations. But I never touched the sysentry stuff and with a few printks I see it is being initialized. Also glibc support. I'm using Debian unstable with a recent (last week) update. -- Steve But are you using libc6-i686? That enables NPTL. Perhaps the behavior difference is there? I'm surprised int 80 doesn't really cause an interrupt; it doesn't jump to the appropriate place in the x86 vector table? Interesting. Jeff [EMAIL PROTECTED]:~# dpkg -s libc6-i686 ... This set of libraries is optimized for i686 machines, and will only be used if you are running a 2.6 kernel on an i686 class CPU (check the output of `uname -m'). This includes Pentium Pro, Pentium II/III/IV, Celeron CPU's and similar class CPU's (including clones such as AMD Athlon/Opteron, VIA C3 Nehemiah, but not VIA C3 Ezla). . This package includes support for NPTL. . Even with libc6-i686 installed, I can't see sysenter got used. libc6-i686 has /lib/tls/i686/cmov/libc.so.6, not the one /lib/libc-2.3.5.so. mozilla gets: Illegal instruction I've added ud2 in both entry.S and vsyscall-sysenter.S. Any ideas? -- Coywolf Qi Hunt http://ahbl.org/~coywolf/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Need help in understanding x86 syscall
I had this question. As per my understanding, in the Linux system call implementation on x86 architecture the call flows like this int 0x80 -> syscall -> sys_call_vector(taken from the table)-> return from interrupt service routine. Now I had the doubt that if the the syscall implementation is very large will the scheduling and other interrupts be blocked for the whole time till the process returns from the ISR (because in an ISR by default the interrupts are disabled unless sti is called explicitly)? Thats appears to be too long for the scheduling or other interrupts to be blocked? Am I missing something here? Thanks Ukil __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Need help in understanding x86 syscall
I had this question. As per my understanding, in the Linux system call implementation on x86 architecture the call flows like this int 0x80 - syscall - sys_call_vector(taken from the table)- return from interrupt service routine. Now I had the doubt that if the the syscall implementation is very large will the scheduling and other interrupts be blocked for the whole time till the process returns from the ISR (because in an ISR by default the interrupts are disabled unless sti is called explicitly)? Thats appears to be too long for the scheduling or other interrupts to be blocked? Am I missing something here? Thanks Ukil __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/