Re: Increase BUFSIZ to 8192

2015-05-14 Thread Slawa Olhovchenkov
On Thu, May 14, 2015 at 08:53:05AM -0600, Ian Lepore wrote:

> At least I'm inclined to ponder it.  Apparently nobody else is.  People
> running servers with more GB of ram than grains of sand on the beach
> won't care about things like 64k buffers used by /bin/sh to read a line
> of text, and all the world is big servers now, right?

I have setups with servering tens of gigabits pers second from one
server. Default send_lowat (SO_SNDLOWAT) is 2048. Settnig to 128K
increase load. Setting to 16k slightly reduce.
Not so simple.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-14 Thread Poul-Henning Kamp

In message <5554b8d6.1010...@mu.org>, Alfred Perlstein writes:

>Shouldn't most of these be using st.st_blksize ?

We had a long discussion about that back when GEOM was young and the
conclusionis that st_blksize doesn't tell you anything useful
and generally does the wrong thing, in particular on non-native
filesystems like msdosfs and cd9660.

But the world is more complex than even that.

For instance on a RAID-5 volume, you want to write stripe-width
chunks, properly aligned, no matter what the st_blksize might be
in your filesystem.  Unless your filesystem is guaranteed to lay
out sequentially, you would have to ask before each write.

Other filesystems may have opinions about read-sizes (ie: NFS).

The only sane way to do this properly would be to ask each
individual file with fcntl(2) for preferred read or write
sizes.

You could then have embedded system mount filesystems with
-o iosize=min
and servers instead use
-o iosize=fastest

But for most practical purposes, having a sane constant BUFSIZ is
just fine.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-14 Thread Poul-Henning Kamp

In message <1431615185.1221.57.ca...@freebsd.org>, Ian Lepore writes:

>I think we've got differing interpretations of what BUFSIZ is for.
>
>IMO, the one correct use of BUFSIZ outside of libc is "if you are going
>to call setbuf() the buffer you pass must be BUFSIZ bytes long."
>
>Over the years, it seems that many people have somehow gotten the
>impression that the intent was "BUFSIZ is the right/ideal/whatever size
>to allocate general purpose IO buffers in any program" 

I don't know when you started, but when I started, on sys-III and
v7 in the mid 1980ies, that was exactly what people told you:
"Do disk-I/O in BUFSIZ units".

I did a quick sampling of src and that seems to be exactly how it is
being used in most of the cases I looked at, including libmd where
I put it there on exactly that reason back in 1994 (5?)

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-14 Thread Alfred Perlstein



On 5/14/15 2:23 AM, Garrett Cooper wrote:

On May 14, 2015, at 1:06, Poul-Henning Kamp  wrote:



In message <20150514075316.gy37...@funkthat.com>, John-Mark Gurney writes:

Poul-Henning Kamp wrote this message on Thu, May 14, 2015 at 07:42 +:


In message <20150514072155.gt37...@funkthat.com>, John-Mark Gurney writes:


Since you apprently missed my original reply, I said that we shouldn't
abuse BUFSIZ for this work, and that it should be changed in mdXhl.c...

Say what ?

BUFSIZ is used entirely appropriately in MDXFileChunk():  For reading
a file into an algorithm.

In fact, posix-2008 references LINE_MAX because:

MDXFileChunk() does not read lines, it reads an entire file.

Being pedantic, technically it’s a portion of a file, which can be the whole thing, 
and it reads it in “sizeof(buffer)” chunks (of which buffer is “hardcoded" to 
BUFSIZ right now).
Cheers!

Shouldn't most of these be using st.st_blksize ?

I recall being part of the move to get rid of PAGE_SIZE, perhaps many 
places should be rid of BUFSIZE as well and BUFSIZE should be something 
we query the system for.


-Alfred
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-14 Thread Ian Lepore
On Thu, 2015-05-14 at 07:42 +, Poul-Henning Kamp wrote:
> 
> In message <20150514072155.gt37...@funkthat.com>, John-Mark Gurney writes:
> 
> >Since you apprently missed my original reply, I said that we shouldn't
> >abuse BUFSIZ for this work, and that it should be changed in mdXhl.c...
> 
> Say what ?
> 
> BUFSIZ is used entirely appropriately in MDXFileChunk():  For reading
> a file into an algorithm.
> 
> If in stead of open(2), fopen(3) had been used, the exact same thing
> would happen, but using malloc space rather than stack space.
> 
> 

I think we've got differing interpretations of what BUFSIZ is for.

IMO, the one correct use of BUFSIZ outside of libc is "if you are going
to call setbuf() the buffer you pass must be BUFSIZ bytes long."

Over the years, it seems that many people have somehow gotten the
impression that the intent was "BUFSIZ is the right/ideal/whatever size
to allocate general purpose IO buffers in any program" and I don't
believe that was ever the intent, or was ever correct.  All such usage
is erronious and must inevitably lead to the situation we've got now:
it's so widely misused that it can't be changed in the context of its
original purpose without pondering what the wider implications of the
change might be.

At least I'm inclined to ponder it.  Apparently nobody else is.  People
running servers with more GB of ram than grains of sand on the beach
won't care about things like 64k buffers used by /bin/sh to read a line
of text, and all the world is big servers now, right?

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-14 Thread Poul-Henning Kamp

In message <72720ea2-c251-40b9-9ec0-702c07d5e...@gmail.com>, Garrett Cooper 
writes:

>Until performance has been characterized on 32-bit vs 
>64-bit architectures, blanket changing a value doesn't make sense.

First time I saw benchmarks which showed improved performance
from a larger BUFSIZe was around 1998...

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-14 Thread Garrett Cooper
On May 14, 2015, at 1:01, Poul-Henning Kamp  wrote:

> 
> In message <1431542835.1221.30.ca...@freebsd.org>, Ian Lepore writes:
>> On Wed, 2015-05-13 at 11:13 -0700, John-Mark Gurney wrote:
> 
>> As I've already pointed out, BUFSIZ appears in the
>> base code over 2000 times.  Where is the analysis of the impact an 8x
>> change is going to have on all those uses?
> 
> Not to pick on Ian in particular, but I'm going to call bike-shed
> on this discussion now.
> 
> Please just make it 4K on 32bit archs and 16K on 64 bit archs, and
> get on with your lives.
> 
> If experience in -current (that's why developers run current, right ?!)
> documents that this was the wrong decision, we can revisit it.
> 
> Until then:  Shut up and code.

Baptiste’s recommendation was related to md5 performance, so it might be that 
(as you pointed out with MDXFileChunk), things might be less performant in the 
application than they could be — but that’s an application bug (only helped by 
scaling issues with FreeBSD, potentially). Until performance has been 
characterized on 32-bit vs 64-bit architectures, blanket changing a value 
doesn’t make sense.

I think that changing buffers sized at BUFSIZ for md5/libmd5 probably makes a 
lot more sense as that change is isolated and the end result could be easily 
micro benchmarked. If/when we have an overall characterization we can look at 
increasing the value across the board.

Thanks!
-NGie


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Increase BUFSIZ to 8192

2015-05-14 Thread Garrett Cooper
On May 14, 2015, at 1:06, Poul-Henning Kamp  wrote:

> 
> In message <20150514075316.gy37...@funkthat.com>, John-Mark Gurney writes:
>> Poul-Henning Kamp wrote this message on Thu, May 14, 2015 at 07:42 +:
>>> 
>>> In message <20150514072155.gt37...@funkthat.com>, John-Mark Gurney writes:
>>> 
 Since you apprently missed my original reply, I said that we shouldn't
 abuse BUFSIZ for this work, and that it should be changed in mdXhl.c...
>>> 
>>> Say what ?
>>> 
>>> BUFSIZ is used entirely appropriately in MDXFileChunk():  For reading
>>> a file into an algorithm.
> 
>> In fact, posix-2008 references LINE_MAX because:
> 
> MDXFileChunk() does not read lines, it reads an entire file.

Being pedantic, technically it’s a portion of a file, which can be the whole 
thing, and it reads it in “sizeof(buffer)” chunks (of which buffer is 
“hardcoded" to BUFSIZ right now).
Cheers!


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Increase BUFSIZ to 8192

2015-05-14 Thread Poul-Henning Kamp

In message <20150514075316.gy37...@funkthat.com>, John-Mark Gurney writes:
>Poul-Henning Kamp wrote this message on Thu, May 14, 2015 at 07:42 +:
>> 
>> In message <20150514072155.gt37...@funkthat.com>, John-Mark Gurney writes:
>> 
>> >Since you apprently missed my original reply, I said that we shouldn't
>> >abuse BUFSIZ for this work, and that it should be changed in mdXhl.c...
>> 
>> Say what ?
>> 
>> BUFSIZ is used entirely appropriately in MDXFileChunk():  For reading
>> a file into an algorithm.

>In fact, posix-2008 references LINE_MAX because:

MDXFileChunk() does not read lines, it reads an entire file.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-14 Thread Poul-Henning Kamp

In message <1431542835.1221.30.ca...@freebsd.org>, Ian Lepore writes:
>On Wed, 2015-05-13 at 11:13 -0700, John-Mark Gurney wrote:

>As I've already pointed out, BUFSIZ appears in the
>base code over 2000 times.  Where is the analysis of the impact an 8x
>change is going to have on all those uses?

Not to pick on Ian in particular, but I'm going to call bike-shed
on this discussion now.

Please just make it 4K on 32bit archs and 16K on 64 bit archs, and
get on with your lives.

If experience in -current (that's why developers run current, right ?!)
documents that this was the wrong decision, we can revisit it.

Until then:  Shut up and code.

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-14 Thread John-Mark Gurney
Poul-Henning Kamp wrote this message on Thu, May 14, 2015 at 07:42 +:
> 
> In message <20150514072155.gt37...@funkthat.com>, John-Mark Gurney writes:
> 
> >Since you apprently missed my original reply, I said that we shouldn't
> >abuse BUFSIZ for this work, and that it should be changed in mdXhl.c...
> 
> Say what ?
> 
> BUFSIZ is used entirely appropriately in MDXFileChunk():  For reading
> a file into an algorithm.

Posix-2008:
BUFSIZ: Size of  buffers.  This shall expand to a positive value.

C99:
BUFSIZ
which expands to an integer constant expression that is the size of
the buffer used by the setbuf function;

In fact, posix-2008 references LINE_MAX because:
Frequently, utility writers selected the UNIX system constant BUFSIZ to
allocate these buffers; therefore, some utilities were limited to 512
bytes for I/O lines, while others achieved 4 096 bytes or greater.

BUFSIZ was already recognized as to small to hold a single line, yet
you're saying it's perfectly fine to use as a buffer for binary data?

> If in stead of open(2), fopen(3) had been used, the exact same thing
> would happen, but using malloc space rather than stack space.

Plus extra overhead.. :)

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-14 Thread Poul-Henning Kamp

In message <20150514072155.gt37...@funkthat.com>, John-Mark Gurney writes:

>Since you apprently missed my original reply, I said that we shouldn't
>abuse BUFSIZ for this work, and that it should be changed in mdXhl.c...

Say what ?

BUFSIZ is used entirely appropriately in MDXFileChunk():  For reading
a file into an algorithm.

If in stead of open(2), fopen(3) had been used, the exact same thing
would happen, but using malloc space rather than stack space.


-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-14 Thread John-Mark Gurney
David Chisnall wrote this message on Wed, May 13, 2015 at 09:27 +0100:
> On 13 May 2015, at 09:03, John-Mark Gurney  wrote:
> > 
> > Poul-Henning Kamp wrote this message on Tue, May 12, 2015 at 06:31 +:
> >> 
> >> In message <20150512032307.gp37...@funkthat.com>, John-Mark Gurney writes:
> >> 
> >>> Also, you'd probably see even better performance by increasing the
> >>> size to 64k, [...]
> >> 
> >> easy:
> >>8K on 32bit
> >>64k on 64bit
> > 
> > Sounds good to me...  Just for people who care... I did a quick set of
> > benchmarks on sha256.. This is using my preliminary patch to use sse4
> > optimized sha256...  But this should be the same for others...
> > 
> > The numbers in ministat output are the time in seconds it takes my
> > 3.4GHz AMD A10-5700 APU running HEAD to process a 512MB file, so lower
> > numbers are better..  I've processed them into easier to read format:
> > BUFSIZ: 145MB/sec
> > 8k: 193MB/sec
> > 16k:198MB/sec
> > 64k:202MB/sec
> > 128k:   202MB/sec
> > -t: 211MB/sec
> 
> It looks like most of the benefit is gained at 16KB.  Did you try running the 
> benchmark with something else running at the same time to see if there is any 
> advantage in trashing the caches a bit less (simple case, what happens if you 
> run two instances of the same benchmark at once)?
> 
> I suspect that you???re about right anyway - I recently did some tests while 
> playing with JavaScript FFI generation with a multithreaded process 
> JavaScript environment calling out to OpenSSL to do SHA calculations and 
> having each of 8 threads reading in 128KB chunks gave the fastest performance 
> (Core i7, 4 cores + hyperthreading), with only a negligible gain over 64KB.  
> In all cases, the JavaScript implementation was significantly faster than the 
> openssl tool, which used 8KB buffers.

Just in case anyone else wants to know how to run benchmarks
themselves..  Go into /usr/src/lib/libmd, edit mdXhl.c, and change
the occurence of BUFSIZ to what you want to test, say 64*1024, run:
make all && make install

and then you can run programs like sha256 -t, or:
for i in `jot 5 1`; do /usr/bin/time sha256 test.file ; done 2> XXX.times

Where test.file is populated maybe like:
dd if=/dev/urandom of=test.file bs=1m count=512

Then run:
ministat XXX.times YYY.times

to compare multiple results...

Happy benchmarking!

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-14 Thread John-Mark Gurney
Ian Lepore wrote this message on Wed, May 13, 2015 at 12:47 -0600:
> On Wed, 2015-05-13 at 11:13 -0700, John-Mark Gurney wrote:
> > Adrian Chadd wrote this message on Wed, May 13, 2015 at 08:34 -0700:
> > > The reason I ask about "why is it faster?" is because for embedded-y
> > > things with low RAM we may not want that to happen due to memory
> > > constraints. However, we may actually want to do some form of
> > > autotuning on some platforms.
> > 
> > If you're already running a program, the difference between 1k and
> > 8k isn't significant... I'll give you 64k can be significant for
> > embedded-y platforms...  But this goes back to the, we need a global
> > knob saying I want low memory usage, and I am willing to pay for it
> > in performance...
> 
> It is NOT just a difference of 1K vs 8K.  It's that much times however
> many BUFSIZ-sized things a program has allocated at once.  It's where
> they are allocated.  As I've already pointed out, BUFSIZ appears in the
> base code over 2000 times.  Where is the analysis of the impact an 8x
> change is going to have on all those uses?

Since you apprently missed my original reply, I said that we shouldn't
abuse BUFSIZ for this work, and that it should be changed in mdXhl.c...

I agree that changing this size to effect all the other files is ill
advised and should not be done...

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-13 Thread Ian Lepore
On Wed, 2015-05-13 at 11:13 -0700, John-Mark Gurney wrote:
> Adrian Chadd wrote this message on Wed, May 13, 2015 at 08:34 -0700:
> > The reason I ask about "why is it faster?" is because for embedded-y
> > things with low RAM we may not want that to happen due to memory
> > constraints. However, we may actually want to do some form of
> > autotuning on some platforms.
> 
> If you're already running a program, the difference between 1k and
> 8k isn't significant... I'll give you 64k can be significant for
> embedded-y platforms...  But this goes back to the, we need a global
> knob saying I want low memory usage, and I am willing to pay for it
> in performance...
> 

It is NOT just a difference of 1K vs 8K.  It's that much times however
many BUFSIZ-sized things a program has allocated at once.  It's where
they are allocated.  As I've already pointed out, BUFSIZ appears in the
base code over 2000 times.  Where is the analysis of the impact an 8x
change is going to have on all those uses?

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-13 Thread John-Mark Gurney
Adrian Chadd wrote this message on Wed, May 13, 2015 at 08:34 -0700:
> The reason I ask about "why is it faster?" is because for embedded-y
> things with low RAM we may not want that to happen due to memory
> constraints. However, we may actually want to do some form of
> autotuning on some platforms.

If you're already running a program, the difference between 1k and
8k isn't significant... I'll give you 64k can be significant for
embedded-y platforms...  But this goes back to the, we need a global
knob saying I want low memory usage, and I am willing to pay for it
in performance...

> So, if it's underlying block size, maybe BUFSIZ isn't the thing to
> tweak, but based on disk io buffer size.
> If it's filling L1 or L2 cache with useful work, maybe auto-tune it
> based on that.

I'm pretty sure this is just simply, syscalls+copies are expensive,
and larger block sizes reduces the number of calls, going from 1k to
64k means 64 times less syscalls...

So, in my benchmark, we went from 148271 syscalls/second to 3228
syscalls/second for 64k block size, and we got a 40% perf increase on
top of this...  i.e. we spend ~40% of the cpu time to do 145k syscalls
instead of doing real work...

> Please don't take this as bikeshedding, I'd really like to see some
> "this is why it's faster" analysis rather than just numbers thrown
> around.

I don't really see a need to analyize this any more... We are batching
work in a more effecient manner...  I could list many other examples
of where we do similar optimizations...

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-13 Thread John-Mark Gurney
Hans Petter Selasky wrote this message on Wed, May 13, 2015 at 10:35 +0200:
> On 05/13/15 10:27, David Chisnall wrote:
> > On 13 May 2015, at 09:03, John-Mark Gurney  wrote:
> >>
> >> Poul-Henning Kamp wrote this message on Tue, May 12, 2015 at 06:31 +:
> >>> 
> >>> In message <20150512032307.gp37...@funkthat.com>, John-Mark Gurney writes:
> >>>
>  Also, you'd probably see even better performance by increasing the
>  size to 64k, [...]
> >>>
> >>> easy:
> >>>   8K on 32bit
> >>>   64k on 64bit
> >>
> >> Sounds good to me...  Just for people who care... I did a quick set of
> >> benchmarks on sha256.. This is using my preliminary patch to use sse4
> >> optimized sha256...  But this should be the same for others...
> >>
> >> The numbers in ministat output are the time in seconds it takes my
> >> 3.4GHz AMD A10-5700 APU running HEAD to process a 512MB file, so lower
> >> numbers are better..  I've processed them into easier to read format:
> >> BUFSIZ:145MB/sec
> >> 8k:193MB/sec
> >> 16k:   198MB/sec
> >> 64k:   202MB/sec
> >> 128k:  202MB/sec
> >> -t:211MB/sec
> >
> > It looks like most of the benefit is gained at 16KB.  Did you try running 
> > the benchmark with something else running at the same time to see if there 
> > is any advantage in trashing the caches a bit less (simple case, what 
> > happens if you run two instances of the same benchmark at once)?
> >
> > I suspect that you???re about right anyway - I recently did some tests 
> > while playing with JavaScript FFI generation with a multithreaded process 
> > JavaScript environment calling out to OpenSSL to do SHA calculations and 
> > having each of 8 threads reading in 128KB chunks gave the fastest 
> > performance (Core i7, 4 cores + hyperthreading), with only a negligible 
> > gain over 64KB.  In all cases, the JavaScript implementation was 
> > significantly faster than the openssl tool, which used 8KB buffers.
> 
> You should also try this using an USB disk. The performance numbers 
> heavily depends on the hardware's interrupt moderation values.

This shouldn't matter.. I wasn't flushing the buffer cache between
runs, so this was entirely from the buffer cache...  This is purely,
syscall+copy overhead that is being measured here...  No matter what
you're source is, NFS, USB disk, you'll always have this overhead...

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-13 Thread Adrian Chadd
[snip]

The reason I ask about "why is it faster?" is because for embedded-y
things with low RAM we may not want that to happen due to memory
constraints. However, we may actually want to do some form of
autotuning on some platforms.

So, if it's underlying block size, maybe BUFSIZ isn't the thing to
tweak, but based on disk io buffer size.
If it's filling L1 or L2 cache with useful work, maybe auto-tune it
based on that.
If it's hiding interrupt latency over USB, then that should be addressed.
etc, etc.

Please don't take this as bikeshedding, I'd really like to see some
"this is why it's faster" analysis rather than just numbers thrown
around.



-adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-13 Thread Ian Lepore
On Wed, 2015-05-13 at 10:35 +0200, Hans Petter Selasky wrote:
> On 05/13/15 10:27, David Chisnall wrote:
> > On 13 May 2015, at 09:03, John-Mark Gurney  wrote:
> >>
> >> Poul-Henning Kamp wrote this message on Tue, May 12, 2015 at 06:31 +:
> >>> 
> >>> In message <20150512032307.gp37...@funkthat.com>, John-Mark Gurney writes:
> >>>
>  Also, you'd probably see even better performance by increasing the
>  size to 64k, [...]
> >>>
> >>> easy:
> >>>   8K on 32bit
> >>>   64k on 64bit
> >>
> >> Sounds good to me...  Just for people who care... I did a quick set of
> >> benchmarks on sha256.. This is using my preliminary patch to use sse4
> >> optimized sha256...  But this should be the same for others...
> >>
> >> The numbers in ministat output are the time in seconds it takes my
> >> 3.4GHz AMD A10-5700 APU running HEAD to process a 512MB file, so lower
> >> numbers are better..  I've processed them into easier to read format:
> >> BUFSIZ:145MB/sec
> >> 8k:193MB/sec
> >> 16k:   198MB/sec
> >> 64k:   202MB/sec
> >> 128k:  202MB/sec
> >> -t:211MB/sec
> >
> > It looks like most of the benefit is gained at 16KB.  Did you try running 
> > the benchmark with something else running at the same time to see if there 
> > is any advantage in trashing the caches a bit less (simple case, what 
> > happens if you run two instances of the same benchmark at once)?
> >
> > I suspect that you’re about right anyway - I recently did some tests while 
> > playing with JavaScript FFI generation with a multithreaded process 
> > JavaScript environment calling out to OpenSSL to do SHA calculations and 
> > having each of 8 threads reading in 128KB chunks gave the fastest 
> > performance (Core i7, 4 cores + hyperthreading), with only a negligible 
> > gain over 64KB.  In all cases, the JavaScript implementation was 
> > significantly faster than the openssl tool, which used 8KB buffers.
> >
> 
> Hi,
> 
> You should also try this using an USB disk. The performance numbers 
> heavily depends on the hardware's interrupt moderation values.


All this discussion should be happening in phabricator, not the email
that announces the review on phab.  But, since it's now happening here,
I guess I'll transplant my comments from there to here...

There are 2125 occurrances of BUFSIZ in the base code (probably 95% of
them inappropriately used to size a local temp buffer or string). Do you
really want to perturb that much working tested software because it
makes md5 faster? How many of those occurrances are stack-allocated
variables and is it wise to allocate 8k buffers on the stack for all of
them? How about existing programs (not necessarily in base) that open
many streams concurrently... what will be the impact of a sudden 8x
increase in memory usage for them?

It seems to me that if libmd needs bigger buffers to perform well, it
should use setvbuf().

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Increase BUFSIZ to 8192

2015-05-13 Thread Hans Petter Selasky

On 05/13/15 10:27, David Chisnall wrote:

On 13 May 2015, at 09:03, John-Mark Gurney  wrote:


Poul-Henning Kamp wrote this message on Tue, May 12, 2015 at 06:31 +:


In message <20150512032307.gp37...@funkthat.com>, John-Mark Gurney writes:


Also, you'd probably see even better performance by increasing the
size to 64k, [...]


easy:
8K on 32bit
64k on 64bit


Sounds good to me...  Just for people who care... I did a quick set of
benchmarks on sha256.. This is using my preliminary patch to use sse4
optimized sha256...  But this should be the same for others...

The numbers in ministat output are the time in seconds it takes my
3.4GHz AMD A10-5700 APU running HEAD to process a 512MB file, so lower
numbers are better..  I've processed them into easier to read format:
BUFSIZ: 145MB/sec
8k: 193MB/sec
16k:198MB/sec
64k:202MB/sec
128k:   202MB/sec
-t: 211MB/sec


It looks like most of the benefit is gained at 16KB.  Did you try running the 
benchmark with something else running at the same time to see if there is any 
advantage in trashing the caches a bit less (simple case, what happens if you 
run two instances of the same benchmark at once)?

I suspect that you’re about right anyway - I recently did some tests while 
playing with JavaScript FFI generation with a multithreaded process JavaScript 
environment calling out to OpenSSL to do SHA calculations and having each of 8 
threads reading in 128KB chunks gave the fastest performance (Core i7, 4 cores 
+ hyperthreading), with only a negligible gain over 64KB.  In all cases, the 
JavaScript implementation was significantly faster than the openssl tool, which 
used 8KB buffers.



Hi,

You should also try this using an USB disk. The performance numbers 
heavily depends on the hardware's interrupt moderation values.


--HPS

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Increase BUFSIZ to 8192

2015-05-13 Thread David Chisnall
On 13 May 2015, at 09:03, John-Mark Gurney  wrote:
> 
> Poul-Henning Kamp wrote this message on Tue, May 12, 2015 at 06:31 +:
>> 
>> In message <20150512032307.gp37...@funkthat.com>, John-Mark Gurney writes:
>> 
>>> Also, you'd probably see even better performance by increasing the
>>> size to 64k, [...]
>> 
>> easy:
>>  8K on 32bit
>>  64k on 64bit
> 
> Sounds good to me...  Just for people who care... I did a quick set of
> benchmarks on sha256.. This is using my preliminary patch to use sse4
> optimized sha256...  But this should be the same for others...
> 
> The numbers in ministat output are the time in seconds it takes my
> 3.4GHz AMD A10-5700 APU running HEAD to process a 512MB file, so lower
> numbers are better..  I've processed them into easier to read format:
> BUFSIZ:   145MB/sec
> 8k:   193MB/sec
> 16k:  198MB/sec
> 64k:  202MB/sec
> 128k: 202MB/sec
> -t:   211MB/sec

It looks like most of the benefit is gained at 16KB.  Did you try running the 
benchmark with something else running at the same time to see if there is any 
advantage in trashing the caches a bit less (simple case, what happens if you 
run two instances of the same benchmark at once)?

I suspect that you’re about right anyway - I recently did some tests while 
playing with JavaScript FFI generation with a multithreaded process JavaScript 
environment calling out to OpenSSL to do SHA calculations and having each of 8 
threads reading in 128KB chunks gave the fastest performance (Core i7, 4 cores 
+ hyperthreading), with only a negligible gain over 64KB.  In all cases, the 
JavaScript implementation was significantly faster than the openssl tool, which 
used 8KB buffers.

David

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Increase BUFSIZ to 8192

2015-05-13 Thread John-Mark Gurney
Poul-Henning Kamp wrote this message on Tue, May 12, 2015 at 06:31 +:
> 
> In message <20150512032307.gp37...@funkthat.com>, John-Mark Gurney writes:
> 
> >Also, you'd probably see even better performance by increasing the
> >size to 64k, [...]
> 
> easy:
>   8K on 32bit
>   64k on 64bit

Sounds good to me...  Just for people who care... I did a quick set of
benchmarks on sha256.. This is using my preliminary patch to use sse4
optimized sha256...  But this should be the same for others...

The numbers in ministat output are the time in seconds it takes my
3.4GHz AMD A10-5700 APU running HEAD to process a 512MB file, so lower
numbers are better..  I've processed them into easier to read format:
BUFSIZ: 145MB/sec
8k: 193MB/sec
16k:198MB/sec
64k:202MB/sec
128k:   202MB/sec
-t: 211MB/sec

x def.times
+ 8k.times
* 16k.times
% 64k.times
# 128k.times
+-+
|#%  *+ x |
|#%  *+ x |
|#%  *+ x |
|##  *+ xx|
|A|  AA|A||
+-+
N   Min   MaxMedian   AvgStddev
x   5  3.53  3.55  3.53 3.536  0.0089442719
+   5  2.65  2.66  2.65 2.654  0.0054772256
Difference at 95.0% confidence
-0.882 +/- 0.0108161
-24.9434% +/- 0.305885%
(Student's t, pooled s = 0.0074162)
*   5  2.58  2.59  2.58 2.584  0.0054772256
Difference at 95.0% confidence
-0.952 +/- 0.0108161
-26.9231% +/- 0.305885%
(Student's t, pooled s = 0.0074162)
%   5  2.53  2.54  2.54 2.538   0.004472136
Difference at 95.0% confidence
-0.998 +/- 0.0103127
-28.224% +/- 0.29165%
(Student's t, pooled s = 0.00707107)
#   5  2.53  2.54  2.53 2.532   0.004472136
Difference at 95.0% confidence
-1.004 +/- 0.0103127
-28.3937% +/- 0.29165%
(Student's t, pooled s = 0.00707107)

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-12 Thread Slawa Olhovchenkov
On Tue, May 12, 2015 at 06:31:33AM +, Poul-Henning Kamp wrote:

> >Also, you'd probably see even better performance by increasing the
> >size to 64k, [...]
> 
> easy:
>   8K on 32bit
>   64k on 64bit

Need benchmarking.
May be 16K is local maximum (L1 cache-efficient).
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-12 Thread Poul-Henning Kamp

In message <20150512032307.gp37...@funkthat.com>, John-Mark Gurney writes:

>Also, you'd probably see even better performance by increasing the
>size to 64k, [...]

easy:
8K on 32bit
64k on 64bit

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-12 Thread Steven Hartland

4k block size on the underlying device?

On 12/05/2015 00:14, Adrian Chadd wrote:

So I'm curious - why's it faster?


-adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-11 Thread John-Mark Gurney
Baptiste Daroussin wrote this message on Tue, May 12, 2015 at 01:06 +0200:
> I would like to change the default value of BUFSIZ to 8192.
> 
> After testing various applications that uses BUFSIZ like changing it in libmd 
> I
> can often see performance improvements:
> 
> For example with md5(1):
> Current BUFSIZ (1024)
> 0.13 real 0.04 user 0.09 sys
> New BUFSIZ (8192)
> 0.08 real 0.04 user 0.03 sys
> 
> sha256(1):
> Before:
> 0.44 real 0.39 user 0.04 sys
> After:
> 0.37 real 0.35 user 0.01 sys
> 
> This is done on a small amd64 Lenovo S20 laptop
> 
> Review available here:
> https://reviews.freebsd.org/D2515

personally, I think the applications that are abusing BUFSIZ should be
fixed to use a properly sized buffer for their applications...  BUFSIZ
is defined for the default stdio buffer sizes...

I got significant perf improvement many years ago by fixed lpd to use
a sane BUFSIZ...  And did the same recently w/ nc (bumping from 2k
to 16k, though I'd'f liked to go larger)...

Also, you'd probably see even better performance by increasing the
size to 64k, though as with all things, this means more memory use
on smaller systems (though on smaller/slower systems, this may be
an even bigger win)...  This mostly just reduces number of context
switches...

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Increase BUFSIZ to 8192

2015-05-11 Thread Adrian Chadd
So I'm curious - why's it faster?


-adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"