Re: Some FreeBSD performance Issues
On 12/11/2007, Randall Hyde [EMAIL PROTECTED] wrote: At this point I'm not sure why FreeBSD's API call is so slow (btw, it's not the system call that's responsible, if I make several additional API calls on each read, e.g., doing lseeks, this has only a marginal impact on performance). But it's pretty clear that if I expect reasonable performance in my own library I'm going to have to do the same thing that glib does and switch over to buffered I/O. Pain in the butt, but there's nothing else to do at this point. Why give up at this point? Why not actually do some pmcstat profiling to see where all the CPU time is going? Adrian -- Adrian Chadd - [EMAIL PROTECTED] ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Some FreeBSD performance Issues
On Sun, Nov 11, 2007 at 09:52:21AM -0800, Randall Hyde wrote: why C's code was so much faster, I dug into the source code and discovered that open/read/write/etc. use *buffered* I/O (which explains why dd performs so well). open/read/write/etc. do _not_ do any buffering in userland. This is easily demonstrated using eg $ ktrace dd if=/dev/random of=/dev/null count=50 bs=1 The relevant part of the output is: 30532 dd CALL read(0x3,0x2820410c,0x1) 30532 dd GIO fd 3 read 1 byte ) 30532 dd RET read 1 30532 dd CALL write(0x4,0x2820410c,0x1) 30532 dd GIO fd 4 wrote 1 byte ) 30532 dd RET write 1 30532 dd CALL read(0x3,0x2820410c,0x1) 30532 dd GIO fd 3 read 1 byte a 30532 dd RET read 1 30532 dd CALL write(0x4,0x2820410c,0x1) 30532 dd GIO fd 4 wrote 1 byte a 30532 dd RET write 1 At this point I'm not sure why FreeBSD's API call is so slow You have yet to provide any evidence of this. So far, you can only demonstrate it on your application - which strongly suggests it's a problem with your code, rather than FreeBSD. Have you check the ktrace output from your code or time(1)d it as suggested? -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an RFC2821-compliant MTA. pgptNezorRMJ0.pgp Description: PGP signature
Re: Some FreeBSD performance Issues
Randall Hyde wrote: Hi All, I recently ported my HLA (High Level Assembler) compiler to FreeBSD and, along with it, the HLA Standard Library. I have a performance-related question concerning file I/O. It appears that character-at-a-time file I/O is *exceptionally* slow. Yes, I realize that when processing large files I really ought to be doing block/buffered I/O to get the best performance, but for certain library routines I've written it's been far more convenient to do character-at-a-time I/O rather than deal with all the buffering issues. In the past, while slower, this character-at-a-time paradigm has provided reasonable, though not stellar, performance under Windows and Linux. However, with the port to FreeBSD I'm seeing a three-orders-of-magnitude performance loss. Here's my little test program: program t; #include( stdlib.hhf ) //#include( bsd.hhf ) static f :dword; buffer :char[64*1024]; begin t; fileio.open( socket.h, fileio.r ); mov( eax, f ); #if( false ) // Windows: 0.25 seconds // BSD: 5.2 seconds while( !fileio.eof( f )) do fileio.getc( f ); //stdout.put( (type char al )); endwhile; #elseif( false ) // Windows: 0.0 seconds (below 1ms threshold) // BSD: 5.2 seconds forever fileio.read( f, buffer, 1 ); breakif( eax 1 ); //stdout.putc( buffer[0] ); endfor; #elseif( false ) // BSD: 5.1 seconds forever bsd.read( f, buffer, 1 ); breakif( @c ); breakif( eax 1 ); //stdout.putc( buffer[0] ); endfor; #else // BSD: 0.016 seconds bsd.read( f, buffer, 64*1024 ); //stdout.write( buffer, eax ); #endif fileio.close( f ); end t; (I selectively set one of the conditionals to true to run a different test; yeah, this is HLA assembly code, but I suspect that most people who can read C can *mostly* figure out what's going on here). The fileio.open call is basically a bsd.open( socket.h, bsd.O_RDONLY ); API call. The socket.h file is about 19K long (it's from the FreeBSD include file set). In particular, I would draw your attention to the first two tests that do character-at-a-time I/O. The difference in performance between Windows and FreeBSD is dramatic (note: Linux numbers are comparable to Windows). Just to make sure that the library code wasn't doing something incredibly stupid, the third test makes a direct FreeBSD API call to read the data a byte at a time -- the results are comparable to the first two tests. Finally, I read the whole file at once, just to make sure the problem was character-at-a-time I/O (which obviously is the problem). Naturally, at one point I'd uncommented all the output statements to verify that I was reading the entire file -- no problem there. Is this really the performance I can expect from FreeBSD when doing character I/O this way? Is is there some tuning parameter I can set to change internal buffering or something? From this numbers, if I had to guess, I'd suspect that FreeBSD was re-reading the entire 4K (or whatever) block from the file cache everytime I read a single character. Can anyone explain what's going on here? I'm loathe to change my fileio module to add buffering as that will create some subtle semantic differences that could break existing code (I do have an object-oriented file I/O class that I'm going to use to implement buffered I/O, I would prefer to leave the fileio module unbuffered, if possible). And a more general question: if this is the way FreeBSD works, should something be done about it? Thanks, Randy Hyde Hello Randy, First, let me out myself as a fan of yours. It was your book that got me started on ASM and taught me a lot about computers and logic, plus it provided some entertainment and mental sustenance in pretty boring times, so thanks! Now, as for your problem: I think I have to agree with the others in this thread when they say that the problem likely isn't in FreeBSD. The following C program, which uses the read(2) call to read socket.h byte-by-byte, runs quickly (0.05 secs on my 2.1GHz system, measured with time(1)): #include fcntl.h #include stdio.h #include stdlib.h #include sys/types.h #include sys/uio.h #include unistd.h int main(int argc, char** argv) { int f; char c; ssize_t result; f = open(/usr/include/sys/socket.h, O_RDONLY); if (f 0) { perror(open); exit(1); } do { result = read(f, c, 1); if (result 0) { perror(read); exit(1); } //printf(%c, c); } while (result = 1); return 0; } This should be quite equivalent to your second and third code fragment; it does one read system call per byte, no buffering involved. This leads me to believe that the slowdown occurs in your fileio.read wrapper, or maybe in the process
Re: Some FreeBSD performance Issues
Hello Randy, First, let me out myself as a fan of yours. It was your book that got me started on ASM and taught me a lot about computers and logic, plus it provided some entertainment and mental sustenance in pretty boring times, so thanks! Now, as for your problem: I think I have to agree with the others in this thread when they say that the problem likely isn't in FreeBSD. The following C program, which uses the read(2) call to read socket.h byte-by-byte, runs quickly (0.05 secs on my 2.1GHz system, measured with time(1)): code snipped This should be quite equivalent to your second and third code fragment; it does one read system call per byte, no buffering involved. This leads me to believe that the slowdown occurs in your fileio.read wrapper, or maybe in the process setup/teardown process. Actually, I'd already gone that route. Looking at the wrong copy of read.c (in the libstd directory) is what had me convinced of the buffering issue. However, the code you posted is still going through libc, so I'm not ready to trust that. However, I just used the syscall system call to make the same INT $80 calls from C that I'm making from the assembly code and that seems to work okay. I've disassembled the code for both programs (my assembly code and the C code that's making direct system calls) and for the life of me, I can't (yet) see any reason why the C code would run two orders of magnitude faster. I guess I'm going to have to look at the start-up code used by the C run-time system and see if it is doing something funny. Note to others: still haven't done ktrace, through looking at the object code for the two programs it's hard to believe that there would be extra system calls taking place or anything like that. If I had to guess at this point, I'd say that my calls are blocking a lot longer than the C program's. My user and system times are low, but real time is very high. Still haven't done ktrace. I'll try that when I get back into town later this week. Cheers, Randy Hyde ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Some FreeBSD performance Issues
Hi All, Well, I've done some sleuthing and discovered some issues. First, the dd command produced approximately the same results everyone else was getting. So I rewrote a version of my test code in C using the stdlib read call and it had really great performance. Not understanding why C's code was so much faster, I dug into the source code and discovered that open/read/write/etc. use *buffered* I/O (which explains why dd performs so well). At this point I'm not sure why FreeBSD's API call is so slow (btw, it's not the system call that's responsible, if I make several additional API calls on each read, e.g., doing lseeks, this has only a marginal impact on performance). But it's pretty clear that if I expect reasonable performance in my own library I'm going to have to do the same thing that glib does and switch over to buffered I/O. Pain in the butt, but there's nothing else to do at this point. Cheers, Randy Hyde ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Some FreeBSD performance Issues
On Fri, Nov 09, 2007 at 08:36:32AM -0800, Randall Hyde wrote: To answer a different question in the thread, I'm pretty sure I'm making only one FreeBSD call per byte, at least in one of the cases I posted. How about using ktrace or similar to confirm this. I wonder if I'm only getting one character output per time slice, or something like that? Maybe, though I don't understand why this would occur. If you time(1) your program, what are the real/user/system time breakdown? That will help clarify where the slowdown is located. Have you tried the dd(1) command suggested. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an RFC2821-compliant MTA. pgpyVqWtYkX0t.pgp Description: PGP signature
Re: Some FreeBSD performance Issues
On Thu, 8 Nov 2007 16:52:38 -0600, Dan Nelson wrote: In the last episode (Nov 08), Randall Hyde said: It appears that character-at-a-time file I/O is *exceptionally* slow. Yes, I realize that when processing large files I really ought to be doing block/buffered I/O to get the best performance, but for certain library routines I've written it's been far more convenient to do character-at-a-time I/O rather than deal with all the buffering issues. In the past, while slower, this character-at-a-time paradigm has provided reasonable, though not stellar, performance under Windows and Linux. However, with the port to FreeBSD I'm seeing a three-orders-of-magnitude performance loss. Here's my little test program: [...] The fileio.open call is basically a bsd.open( socket.h, bsd.O_RDONLY ); API call. The socket.h file is about 19K long (it's from the FreeBSD include file set). In particular, I would draw your attention to the first two tests that do character-at-a-time I/O. The difference in performance What timings does dd if=/usr/include/sys/socket.h of=/dev/null ibs=1 obs=64k report? It takes about .4 sec on my non-idle dual pIII-900 system. Try truss'ing your program as it runs; maybe the program is doing some extra syscalls you aren't aware of? on FreeBSD 6.2-RELEASE, it returns, dd if=/usr/include/sys/socket.h of=/dev/null ibs=1 obs=64k 18426+0 records in 0+1 records out 18426 bytes transferred in 0.070472 secs (261466 bytes/sec) -- Regards, /\_/\ All dogs go to heaven. [EMAIL PROTECTED](0 0) http://www.openmalaysiablog.com/ +==oOO--(_)--OOo==+ | for a in past present future; do| | for b in clients employers associates relatives neighbours pets; do | | echo The opinions here in no way reflect the opinions of my $a $b. | | done; done | +=+ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Some FreeBSD performance Issues
You should also carefully do an strace or similar on Windows and Linux as well. You may find that you're doing a system call per byte on FreeBSD but not on those other systems. Certainly this might be possible under Windows, as I have no idea what happens once I link in one of the various kernel.dll modules. Under Linux, however, I am directly issuing the INT($80) instruction, so one system call per byte is being made. To answer a different question in the thread, I'm pretty sure I'm making only one FreeBSD call per byte, at least in one of the cases I posted. You'll note that one of the test examples made a call to bsd.read( fd, buffer, 1 );. That's just a function I wrote that rearranges parameters and sets up the stack, executes an INT( $80 ) instruction, cleans up the stack, and returns to the user. In a different test example I *was* making a couple of calls, (specifically to lseek to check to see if I'd reached EOF), but the performance difference was minimal (i.e., the time was being spent in the read call). I have to run off for an appt right now, but I'll try the dd command later today and see what that reports. I wonder if I'm only getting one character output per time slice, or something like that? Cheers, Randy Hyde ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Some FreeBSD performance Issues
Hi All, I recently ported my HLA (High Level Assembler) compiler to FreeBSD and, along with it, the HLA Standard Library. I have a performance-related question concerning file I/O. It appears that character-at-a-time file I/O is *exceptionally* slow. Yes, I realize that when processing large files I really ought to be doing block/buffered I/O to get the best performance, but for certain library routines I've written it's been far more convenient to do character-at-a-time I/O rather than deal with all the buffering issues. In the past, while slower, this character-at-a-time paradigm has provided reasonable, though not stellar, performance under Windows and Linux. However, with the port to FreeBSD I'm seeing a three-orders-of-magnitude performance loss. Here's my little test program: program t; #include( stdlib.hhf ) //#include( bsd.hhf ) static f :dword; buffer :char[64*1024]; begin t; fileio.open( socket.h, fileio.r ); mov( eax, f ); #if( false ) // Windows: 0.25 seconds // BSD: 5.2 seconds while( !fileio.eof( f )) do fileio.getc( f ); //stdout.put( (type char al )); endwhile; #elseif( false ) // Windows: 0.0 seconds (below 1ms threshold) // BSD: 5.2 seconds forever fileio.read( f, buffer, 1 ); breakif( eax 1 ); //stdout.putc( buffer[0] ); endfor; #elseif( false ) // BSD: 5.1 seconds forever bsd.read( f, buffer, 1 ); breakif( @c ); breakif( eax 1 ); //stdout.putc( buffer[0] ); endfor; #else // BSD: 0.016 seconds bsd.read( f, buffer, 64*1024 ); //stdout.write( buffer, eax ); #endif fileio.close( f ); end t; (I selectively set one of the conditionals to true to run a different test; yeah, this is HLA assembly code, but I suspect that most people who can read C can *mostly* figure out what's going on here). The fileio.open call is basically a bsd.open( socket.h, bsd.O_RDONLY ); API call. The socket.h file is about 19K long (it's from the FreeBSD include file set). In particular, I would draw your attention to the first two tests that do character-at-a-time I/O. The difference in performance between Windows and FreeBSD is dramatic (note: Linux numbers are comparable to Windows). Just to make sure that the library code wasn't doing something incredibly stupid, the third test makes a direct FreeBSD API call to read the data a byte at a time -- the results are comparable to the first two tests. Finally, I read the whole file at once, just to make sure the problem was character-at-a-time I/O (which obviously is the problem). Naturally, at one point I'd uncommented all the output statements to verify that I was reading the entire file -- no problem there. Is this really the performance I can expect from FreeBSD when doing character I/O this way? Is is there some tuning parameter I can set to change internal buffering or something? From this numbers, if I had to guess, I'd suspect that FreeBSD was re-reading the entire 4K (or whatever) block from the file cache everytime I read a single character. Can anyone explain what's going on here? I'm loathe to change my fileio module to add buffering as that will create some subtle semantic differences that could break existing code (I do have an object-oriented file I/O class that I'm going to use to implement buffered I/O, I would prefer to leave the fileio module unbuffered, if possible). And a more general question: if this is the way FreeBSD works, should something be done about it? Thanks, Randy Hyde ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Some FreeBSD performance Issues
In the last episode (Nov 08), Randall Hyde said: It appears that character-at-a-time file I/O is *exceptionally* slow. Yes, I realize that when processing large files I really ought to be doing block/buffered I/O to get the best performance, but for certain library routines I've written it's been far more convenient to do character-at-a-time I/O rather than deal with all the buffering issues. In the past, while slower, this character-at-a-time paradigm has provided reasonable, though not stellar, performance under Windows and Linux. However, with the port to FreeBSD I'm seeing a three-orders-of-magnitude performance loss. Here's my little test program: [...] The fileio.open call is basically a bsd.open( socket.h, bsd.O_RDONLY ); API call. The socket.h file is about 19K long (it's from the FreeBSD include file set). In particular, I would draw your attention to the first two tests that do character-at-a-time I/O. The difference in performance What timings does dd if=/usr/include/sys/socket.h of=/dev/null ibs=1 obs=64k report? It takes about .4 sec on my non-idle dual pIII-900 system. Try truss'ing your program as it runs; maybe the program is doing some extra syscalls you aren't aware of? -- Dan Nelson [EMAIL PROTECTED] ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Some FreeBSD performance Issues
Dan Nelson wrote: In the last episode (Nov 08), Randall Hyde said: It appears that character-at-a-time file I/O is *exceptionally* slow. ... reasonable, though not stellar, performance under Windows and Linux. However, with the port to FreeBSD I'm seeing a three-orders-of-magnitude performance loss. ... What timings does dd if=/usr/include/sys/socket.h of=/dev/null ibs=1 obs=64k report? It takes about .4 sec on my non-idle dual pIII-900 system. Try truss'ing your program as it runs; maybe the program is doing some extra syscalls you aren't aware of? You should also carefully do an strace or similar on Windows and Linux as well. You may find that you're doing a system call per byte on FreeBSD but not on those other systems. Tim Kientzle ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]