Re: Some FreeBSD performance Issues

2007-11-12 Thread Adrian Chadd
On 12/11/2007, Randall Hyde [EMAIL PROTECTED] wrote:

 At this point I'm not sure why FreeBSD's API call is so slow (btw, it's not
 the system call that's responsible, if I make several additional API calls
 on each read, e.g., doing lseeks, this has only a marginal impact on
 performance). But it's pretty clear that if I expect reasonable performance
 in my own library I'm going to have to do the same thing that glib does and
 switch over to buffered I/O.  Pain in the butt, but there's nothing else to
 do at this point.

Why give up at this point? Why not actually do some pmcstat profiling
to see where all the CPU time is going?


Adrian


-- 
Adrian Chadd - [EMAIL PROTECTED]
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Some FreeBSD performance Issues

2007-11-12 Thread Peter Jeremy
On Sun, Nov 11, 2007 at 09:52:21AM -0800, Randall Hyde wrote:
why C's code was so much faster, I dug into the source code and discovered
that open/read/write/etc. use *buffered* I/O (which explains why dd
performs so well).

open/read/write/etc. do _not_ do any buffering in userland.  This is
easily demonstrated using eg
$ ktrace dd if=/dev/random of=/dev/null count=50 bs=1
The relevant part of the output is:
 30532 dd   CALL  read(0x3,0x2820410c,0x1)
 30532 dd   GIO   fd 3 read 1 byte
   )
 30532 dd   RET   read 1
 30532 dd   CALL  write(0x4,0x2820410c,0x1)
 30532 dd   GIO   fd 4 wrote 1 byte
   )
 30532 dd   RET   write 1
 30532 dd   CALL  read(0x3,0x2820410c,0x1)
 30532 dd   GIO   fd 3 read 1 byte
   a
 30532 dd   RET   read 1
 30532 dd   CALL  write(0x4,0x2820410c,0x1)
 30532 dd   GIO   fd 4 wrote 1 byte
   a
 30532 dd   RET   write 1

At this point I'm not sure why FreeBSD's API call is so slow

You have yet to provide any evidence of this.  So far, you can only
demonstrate it on your application - which strongly suggests it's a
problem with your code, rather than FreeBSD.

Have you check the ktrace output from your code or time(1)d it as
suggested?

-- 
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an RFC2821-compliant MTA.


pgptNezorRMJ0.pgp
Description: PGP signature


Re: Some FreeBSD performance Issues

2007-11-12 Thread Benjamin Lutz
Randall Hyde wrote:
 Hi All,
 
 I recently ported my HLA (High Level Assembler) compiler to FreeBSD and,
 along with it, the HLA Standard Library. I have a performance-related
 question concerning file I/O.
 
 It appears that character-at-a-time file I/O is *exceptionally* slow. Yes, I
 realize that when processing large files I really ought to be doing
 block/buffered I/O to get the best performance, but for certain library
 routines I've written it's been far more convenient to do
 character-at-a-time I/O rather than deal with all the buffering issues.  In
 the past, while slower, this character-at-a-time paradigm has provided
 reasonable, though not stellar, performance under Windows and Linux.
 However, with the port to FreeBSD I'm seeing a three-orders-of-magnitude
 performance loss.  Here's my little test program:
 
 program t;
 #include( stdlib.hhf )
 //#include( bsd.hhf )
 
 static
 f   :dword;
 buffer  :char[64*1024];
 
 begin t;
 
 fileio.open( socket.h, fileio.r );
 mov( eax, f );
 #if( false )
 
 // Windows: 0.25 seconds
 // BSD: 5.2 seconds
 
 while( !fileio.eof( f )) do
 
 fileio.getc( f );
 //stdout.put( (type char al ));
 
 endwhile;
 
 #elseif( false )
 
 // Windows: 0.0 seconds (below 1ms threshold)
 // BSD: 5.2 seconds
 
 forever
 
 fileio.read( f, buffer, 1 );
 breakif( eax  1 );
 //stdout.putc( buffer[0] );
 
 endfor;
 
 #elseif( false )
 
 // BSD: 5.1 seconds
 
 forever
 
 bsd.read( f, buffer, 1 );
 breakif( @c );
 breakif( eax  1 );
 //stdout.putc( buffer[0] );
 
 endfor;
 
 #else
 
 // BSD: 0.016 seconds
 
 bsd.read( f, buffer, 64*1024 );
 //stdout.write( buffer, eax );
 
 #endif
 
 fileio.close( f );
 
 end t;
 
 (I selectively set one of the conditionals to true to run a different test;
 yeah, this is HLA assembly code, but I suspect that most people who can read
 C can *mostly* figure out what's going on here).
 
 The fileio.open call is basically a bsd.open( socket.h, bsd.O_RDONLY );
 API call.  The socket.h file is about 19K long (it's from the FreeBSD
 include file set). In particular, I would draw your attention to the first
 two tests that do character-at-a-time I/O. The difference in performance
 between Windows and FreeBSD is dramatic (note: Linux numbers are comparable
 to Windows). Just to make sure that the library code wasn't doing something
 incredibly stupid, the third test makes a direct FreeBSD API call to read
 the data a byte at a time -- the results are comparable to the first two
 tests. Finally, I read the whole file at once, just to make sure the problem
 was character-at-a-time I/O (which obviously is the problem).  Naturally, at
 one point I'd uncommented all the output statements to verify that I was
 reading the entire file -- no problem there.
 
 Is this really the performance I can expect from FreeBSD when doing
 character I/O this way? Is is there some tuning parameter I can set to
 change internal buffering or something?  From this numbers, if I had to
 guess, I'd suspect that FreeBSD was re-reading the entire 4K (or whatever)
 block from the file cache everytime I read a single character. Can anyone
 explain what's going on here?  I'm loathe to change my fileio module to add
 buffering as that will create some subtle semantic differences that could
 break existing code (I do have an object-oriented file I/O class that I'm
 going to use to implement buffered I/O, I would prefer to leave the fileio
 module unbuffered, if possible).
 
 And a more general question: if this is the way FreeBSD works, should
 something be done about it?
 Thanks,
 Randy Hyde


Hello Randy,

First, let me out myself as a fan of yours. It was your book that got me
started on ASM and taught me a lot about computers and logic, plus it
provided some entertainment and mental sustenance in pretty boring
times, so thanks!

Now, as for your problem: I think I have to agree with the others in
this thread when they say that the problem likely isn't in FreeBSD. The
following C program, which uses the read(2) call to read socket.h
byte-by-byte, runs quickly (0.05 secs on my 2.1GHz system, measured with
time(1)):

#include fcntl.h
#include stdio.h
#include stdlib.h
#include sys/types.h
#include sys/uio.h
#include unistd.h

int main(int argc, char** argv) {
int f;
char c;
ssize_t result;

f = open(/usr/include/sys/socket.h, O_RDONLY);
if (f  0) { perror(open); exit(1); }

do {
result = read(f, c, 1);
if (result  0) { perror(read); exit(1); }
//printf(%c, c);
} while (result = 1);

return 0;
}

This should be quite equivalent to your second and third code fragment;
it does one read system call per byte, no buffering involved. This leads
me to believe that the slowdown occurs in your fileio.read wrapper, or
maybe in the process 

Re: Some FreeBSD performance Issues

2007-11-12 Thread Randall Hyde



 Hello Randy,

 First, let me out myself as a fan of yours. It was your book that got me
 started on ASM and taught me a lot about computers and logic, plus it
 provided some entertainment and mental sustenance in pretty boring
 times, so thanks!

 Now, as for your problem: I think I have to agree with the others in
 this thread when they say that the problem likely isn't in FreeBSD. The
 following C program, which uses the read(2) call to read socket.h
 byte-by-byte, runs quickly (0.05 secs on my 2.1GHz system, measured with
 time(1)):


 code snipped


 This should be quite equivalent to your second and third code fragment;
 it does one read system call per byte, no buffering involved. This leads
 me to believe that the slowdown occurs in your fileio.read wrapper, or
 maybe in the process setup/teardown process.

Actually, I'd already gone that route. Looking at the wrong copy of read.c
(in the libstd directory) is what had me convinced of the buffering issue.

However, the code you posted is still going through libc, so I'm not ready
to trust that.
However, I just used the syscall system call to make the same INT $80 calls
from C that I'm making from the assembly code and that seems to work okay.

I've disassembled the code for both programs (my assembly code and the C
code that's making direct system calls) and for the life of me, I can't
(yet) see any reason why the C code would run two orders of magnitude
faster. I guess I'm going to have to look at the start-up code used by the C
run-time system and see if it is doing something funny.

Note to others: still haven't done ktrace, through looking at the object
code for the two programs it's hard to believe that there would be extra
system calls taking place or anything like that. If I had to guess at this
point, I'd say that my calls are blocking a lot longer than the C program's.
My user and system times are low, but real time is very high.  Still haven't
done ktrace. I'll try that when I get back into town later this week.

Cheers,
Randy Hyde

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Some FreeBSD performance Issues

2007-11-11 Thread Randall Hyde
Hi All,

Well, I've done some sleuthing and discovered some issues.

First, the dd command produced approximately the same results everyone
else was getting. So I rewrote a version of my test code in C using the
stdlib read call and it had really great performance. Not understanding
why C's code was so much faster, I dug into the source code and discovered
that open/read/write/etc. use *buffered* I/O (which explains why dd
performs so well).

At this point I'm not sure why FreeBSD's API call is so slow (btw, it's not
the system call that's responsible, if I make several additional API calls
on each read, e.g., doing lseeks, this has only a marginal impact on
performance). But it's pretty clear that if I expect reasonable performance
in my own library I'm going to have to do the same thing that glib does and
switch over to buffered I/O.  Pain in the butt, but there's nothing else to
do at this point.
Cheers,
Randy Hyde

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Some FreeBSD performance Issues

2007-11-10 Thread Peter Jeremy
On Fri, Nov 09, 2007 at 08:36:32AM -0800, Randall Hyde wrote:
To answer a different question in the thread, I'm pretty sure I'm making
only one FreeBSD call per byte, at least in one of the cases I posted.

How about using ktrace or similar to confirm this.

I wonder if I'm only getting one character output per time slice, or
something like that?

Maybe, though I don't understand why this would occur.  If you time(1)
your program, what are the real/user/system time breakdown?  That will
help clarify where the slowdown is located.  Have you tried the dd(1)
command suggested.

-- 
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an RFC2821-compliant MTA.


pgpyVqWtYkX0t.pgp
Description: PGP signature


Re: Some FreeBSD performance Issues

2007-11-09 Thread Dinesh Nair
On Thu, 8 Nov 2007 16:52:38 -0600, Dan Nelson wrote:

 In the last episode (Nov 08), Randall Hyde said:
  It appears that character-at-a-time file I/O is *exceptionally* slow.
  Yes, I realize that when processing large files I really ought to be
  doing block/buffered I/O to get the best performance, but for certain
  library routines I've written it's been far more convenient to do
  character-at-a-time I/O rather than deal with all the buffering
  issues.  In the past, while slower, this character-at-a-time paradigm
  has provided reasonable, though not stellar, performance under
  Windows and Linux. However, with the port to FreeBSD I'm seeing a
  three-orders-of-magnitude performance loss.  Here's my little test
  program:
 [...] 
  The fileio.open call is basically a bsd.open( socket.h,
  bsd.O_RDONLY ); API call.  The socket.h file is about 19K long (it's
  from the FreeBSD include file set). In particular, I would draw your
  attention to the first two tests that do character-at-a-time I/O. The
  difference in performance
 
 What timings does 
 dd if=/usr/include/sys/socket.h of=/dev/null ibs=1 obs=64k report? 
 It takes about .4 sec on my non-idle dual pIII-900 system.  Try
 truss'ing your program as it runs; maybe the program is doing some
 extra syscalls you aren't aware of?
 

on FreeBSD 6.2-RELEASE, it returns,

dd if=/usr/include/sys/socket.h of=/dev/null ibs=1 obs=64k
18426+0 records in
0+1 records out
18426 bytes transferred in 0.070472 secs (261466 bytes/sec)

-- 
Regards,   /\_/\   All dogs go to heaven.
[EMAIL PROTECTED](0 0)   http://www.openmalaysiablog.com/
+==oOO--(_)--OOo==+
| for a in past present future; do|
|   for b in clients employers associates relatives neighbours pets; do   |
|   echo The opinions here in no way reflect the opinions of my $a $b.  |
| done; done  |
+=+
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Some FreeBSD performance Issues

2007-11-09 Thread Randall Hyde


 You should also carefully do an strace or similar on
 Windows and Linux as well.  You may find that you're
 doing a system call per byte on FreeBSD but not on
 those other systems.

Certainly this might be possible under Windows, as I have no idea what
happens once I link in one of the various kernel.dll modules. Under Linux,
however, I am directly issuing the INT($80) instruction, so one system call
per byte is being made.

To answer a different question in the thread, I'm pretty sure I'm making
only one FreeBSD call per byte, at least in one of the cases I posted.
You'll note that one of the test examples made a call to bsd.read( fd,
buffer, 1 );. That's just a function I wrote that rearranges parameters and
sets up the stack, executes an INT( $80 ) instruction, cleans up the stack,
and returns to the user. In a different test example I *was* making a couple
of calls, (specifically to lseek to check to see if I'd reached EOF), but
the performance difference was minimal (i.e., the time was being spent in
the read call). I have to run off for an appt right now, but I'll try the
dd command later today and see what that reports.

I wonder if I'm only getting one character output per time slice, or
something like that?
Cheers,
Randy Hyde

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Some FreeBSD performance Issues

2007-11-08 Thread Randall Hyde
Hi All,

I recently ported my HLA (High Level Assembler) compiler to FreeBSD and,
along with it, the HLA Standard Library. I have a performance-related
question concerning file I/O.

It appears that character-at-a-time file I/O is *exceptionally* slow. Yes, I
realize that when processing large files I really ought to be doing
block/buffered I/O to get the best performance, but for certain library
routines I've written it's been far more convenient to do
character-at-a-time I/O rather than deal with all the buffering issues.  In
the past, while slower, this character-at-a-time paradigm has provided
reasonable, though not stellar, performance under Windows and Linux.
However, with the port to FreeBSD I'm seeing a three-orders-of-magnitude
performance loss.  Here's my little test program:

program t;
#include( stdlib.hhf )
//#include( bsd.hhf )

static
f   :dword;
buffer  :char[64*1024];

begin t;

fileio.open( socket.h, fileio.r );
mov( eax, f );
#if( false )

// Windows: 0.25 seconds
// BSD: 5.2 seconds

while( !fileio.eof( f )) do

fileio.getc( f );
//stdout.put( (type char al ));

endwhile;

#elseif( false )

// Windows: 0.0 seconds (below 1ms threshold)
// BSD: 5.2 seconds

forever

fileio.read( f, buffer, 1 );
breakif( eax  1 );
//stdout.putc( buffer[0] );

endfor;

#elseif( false )

// BSD: 5.1 seconds

forever

bsd.read( f, buffer, 1 );
breakif( @c );
breakif( eax  1 );
//stdout.putc( buffer[0] );

endfor;

#else

// BSD: 0.016 seconds

bsd.read( f, buffer, 64*1024 );
//stdout.write( buffer, eax );

#endif

fileio.close( f );

end t;

(I selectively set one of the conditionals to true to run a different test;
yeah, this is HLA assembly code, but I suspect that most people who can read
C can *mostly* figure out what's going on here).

The fileio.open call is basically a bsd.open( socket.h, bsd.O_RDONLY );
API call.  The socket.h file is about 19K long (it's from the FreeBSD
include file set). In particular, I would draw your attention to the first
two tests that do character-at-a-time I/O. The difference in performance
between Windows and FreeBSD is dramatic (note: Linux numbers are comparable
to Windows). Just to make sure that the library code wasn't doing something
incredibly stupid, the third test makes a direct FreeBSD API call to read
the data a byte at a time -- the results are comparable to the first two
tests. Finally, I read the whole file at once, just to make sure the problem
was character-at-a-time I/O (which obviously is the problem).  Naturally, at
one point I'd uncommented all the output statements to verify that I was
reading the entire file -- no problem there.

Is this really the performance I can expect from FreeBSD when doing
character I/O this way? Is is there some tuning parameter I can set to
change internal buffering or something?  From this numbers, if I had to
guess, I'd suspect that FreeBSD was re-reading the entire 4K (or whatever)
block from the file cache everytime I read a single character. Can anyone
explain what's going on here?  I'm loathe to change my fileio module to add
buffering as that will create some subtle semantic differences that could
break existing code (I do have an object-oriented file I/O class that I'm
going to use to implement buffered I/O, I would prefer to leave the fileio
module unbuffered, if possible).

And a more general question: if this is the way FreeBSD works, should
something be done about it?
Thanks,
Randy Hyde

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Some FreeBSD performance Issues

2007-11-08 Thread Dan Nelson
In the last episode (Nov 08), Randall Hyde said:
 It appears that character-at-a-time file I/O is *exceptionally* slow.
 Yes, I realize that when processing large files I really ought to be
 doing block/buffered I/O to get the best performance, but for certain
 library routines I've written it's been far more convenient to do
 character-at-a-time I/O rather than deal with all the buffering
 issues.  In the past, while slower, this character-at-a-time paradigm
 has provided reasonable, though not stellar, performance under
 Windows and Linux. However, with the port to FreeBSD I'm seeing a
 three-orders-of-magnitude performance loss.  Here's my little test
 program:
[...] 
 The fileio.open call is basically a bsd.open( socket.h, bsd.O_RDONLY );
 API call.  The socket.h file is about 19K long (it's from the FreeBSD
 include file set). In particular, I would draw your attention to the first
 two tests that do character-at-a-time I/O. The difference in performance

What timings does 
dd if=/usr/include/sys/socket.h of=/dev/null ibs=1 obs=64k report? 
It takes about .4 sec on my non-idle dual pIII-900 system.  Try
truss'ing your program as it runs; maybe the program is doing some
extra syscalls you aren't aware of?

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Some FreeBSD performance Issues

2007-11-08 Thread Tim Kientzle

Dan Nelson wrote:

In the last episode (Nov 08), Randall Hyde said:


It appears that character-at-a-time file I/O is *exceptionally* slow.
...  reasonable, though not stellar, performance under
Windows and Linux. However, with the port to FreeBSD I'm seeing a
three-orders-of-magnitude performance loss. ...


What timings does 
dd if=/usr/include/sys/socket.h of=/dev/null ibs=1 obs=64k report? 
It takes about .4 sec on my non-idle dual pIII-900 system.  Try

truss'ing your program as it runs; maybe the program is doing some
extra syscalls you aren't aware of?


You should also carefully do an strace or similar on
Windows and Linux as well.  You may find that you're
doing a system call per byte on FreeBSD but not on
those other systems.

Tim Kientzle
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]