subject:"Performance improvements for src\/port\/snprintf.c"

Re: Performance improvements for src/port/snprintf.c

2018-10-11 Thread Andres Freund

Hi,

On 2018-10-07 12:59:18 +0100, Andrew Gierth wrote:
> > "Tom" == Tom Lane  writes:
> 
>  Tom> Now, "shortest value that converts back exactly" is technically
>  Tom> cool, but I am not sure that it solves any real-world problem that
>  Tom> we have.
> 
> Well, it seems to me that it is perfect for pg_dump.
> 
> Also it's kind of a problem that our default float output is not
> round-trip safe - people do keep wondering why they can select a row and
> it'll show a certain value, but then doing WHERE col = 'xxx' on that
> value does not find the row. Yes, testing equality of floats is bad, but
> there's no reason to put in extra landmines.

+1


>  Tom> I'm also worried that introducing it would result in complaints like
>  Tom> 
> https://www.postgresql.org/message-id/CANaXbVjw3Y8VmapWuZahtcRhpE61hsSUcjquip3HuXeuN8y4sg%40mail.gmail.com
> 
> Frankly for a >20x performance improvement in float8out I don't think
> that's an especially big deal.

+1. There's plenty complaints where we just say "sorry that it bothers
you, but these larger concerns made us that way".


> I do not see any obvious way to use this code to generate the same
> output in the final digits that we currently do (in the sense of
> overly-exact values like outputting 1.89991 for 1.9 when
> extra_float_digits=3).

But, why would that be required? Just to placate people wanting exactly
the same output as before?  I don't quite get how that'd be a useful
requirement.

Obviously we *do* need to support outputting non-exponent style output
where appropriate, but that should mostly be different massaging of
d2d()'s output, instead of calling to_chars() as the ryu upstream code
does.  ISTM we also need to support *reducing* the precision (for the
case where people intentionally reduce extra_float_digits), but that
similarly should be a SMOP, right?-

- Andres

Re: Performance improvements for src/port/snprintf.c

2018-10-07 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 Tom> However, it seems like it should still be on the table to look at
 Tom> other code that just does sprintf's job faster (such as the stb
 Tom> code Alexander mentioned).

The stb sprintf is indeed a lot faster for floats than other sprintfs,
but (a) it's still quite a lot slower than Ryu (COPY of my test table is
4.2 seconds with stb, vs 2.7 seconds with Ryu), and (b) it also produces
changes in the insignificant digits, so while (it claims) the values are
still round-trip convertible, they are neither the shortest
representation nor the exact representation.

For example, consider 1.9, which is 0x3FFE:

exact value: 1.899911182158029987476766109466552734375
accepted input range:
  min: 1.89980015985556747182272374629974365234375
  max: 1.90002220446049250313080847263336181640625

exact value rounded to 18 SF: 1.89991

Ryu output: 1.9E0
STB (%*.18g) output: 1.89992
sprintf (%*.18g) output: 1.89991

So while STB's output is in the acceptable range, it's not the result of
rounding the exact value to 18 digits (as sprintf does on my system at
least) and nor is it the shortest. Testing a bunch of random values it
usually seems to be off from the rounded exact result by +/- 1 in the
last digit.

-- 
Andrew (irc:RhodiumToad)

Re: Performance improvements for src/port/snprintf.c

2018-10-07 Thread Tom Lane

Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> Now, "shortest value that converts back exactly" is technically
>  Tom> cool, but I am not sure that it solves any real-world problem that
>  Tom> we have.

> Well, it seems to me that it is perfect for pg_dump.

Perhaps.  I was hoping for something we could slot into snprintf.c;
not being able to select the number of digits to output is clearly
a deal-breaker for that usage.  But perhaps it's reasonable to allow
"extra_float_digits = 3" to be redefined as meaning "use the shortest
value that converts back exactly" in float[48]out.

However, it seems like it should still be on the table to look at
other code that just does sprintf's job faster (such as the stb
code Alexander mentioned).  If anything like that is acceptable
for the general case, then we have to ask whether ryu is enough
faster than *that* code, not faster than what we have now, to
justify carrying another umpteen KB of independent code path
for the dump-and-restore case.

> Also it's kind of a problem that our default float output is not
> round-trip safe - people do keep wondering why they can select a row and
> it'll show a certain value, but then doing WHERE col = 'xxx' on that
> value does not find the row.

Unfortunately, I do not think it's going to be acceptable for default
float output (as opposed to the dump/restore case) to become round-trip
safe.  The number of people complaining today would be dwarfed by the
number of people complaining about extra garbage digits in their results.
There isn't any compromise that will make things "just work" for people
who are unaware of the subtleties of float arithmetic.

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-10-07 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 Tom> Now, "shortest value that converts back exactly" is technically
 Tom> cool, but I am not sure that it solves any real-world problem that
 Tom> we have.

Well, it seems to me that it is perfect for pg_dump.

Also it's kind of a problem that our default float output is not
round-trip safe - people do keep wondering why they can select a row and
it'll show a certain value, but then doing WHERE col = 'xxx' on that
value does not find the row. Yes, testing equality of floats is bad, but
there's no reason to put in extra landmines.

 Tom> I'm also worried that introducing it would result in complaints like
 Tom> 
https://www.postgresql.org/message-id/CANaXbVjw3Y8VmapWuZahtcRhpE61hsSUcjquip3HuXeuN8y4sg%40mail.gmail.com

Frankly for a >20x performance improvement in float8out I don't think
that's an especially big deal.

 Tom> As for #2, my *very* short once-over of the code led me to think
 Tom> that the speed win comes mostly from use of wide integer
 Tom> arithmetic,

Data point: forcing it to use 64-bit only (#define RYU_ONLY_64_BIT_OPS)
makes negligible difference on my test setup.

 Tom> and maybe from throwing big lookup tables at the problem. If so,
 Tom> it's very likely possible that we could adopt those techniques
 Tom> without necessarily buying into the shortest-exact rule for how
 Tom> many digits to print.

If you read the ACM paper (linked from the upstream github repo), it
explains how the algorithm works by combining the radix conversion step
with (the initial iterations of) the operation of finding the shortest
representation. This allows limiting the number of bits needed for the
intermediate results so that it can all be done in fixed-size integers,
rather than using an arbitrary-precision approach.

I do not see any obvious way to use this code to generate the same
output in the final digits that we currently do (in the sense of
overly-exact values like outputting 1.89991 for 1.9 when
extra_float_digits=3).

 >> One option would be to stick with snprintf if extra_float_digits is
 >> less than 0 (or less than or equal to 0 and make the default 1) and
 >> use ryu otherwise, so that the option to get rounded floats is still
 >> there. (Apparently some people do use negative values of
 >> extra_float_digits.) Unlike other format-changing GUCs, this one
 >> already exists and is already used by people who want more or less
 >> precision, including by pg_dump where rount-trip conversion is the
 >> requirement.

 Tom> I wouldn't necessarily object to having some value of
 Tom> extra_float_digits that selects the shortest-exact rule, but I'm
 Tom> thinking maybe it should be a value we don't currently accept.

Why would anyone currently set extra_float_digits > 0 if not to get
round-trip-safe values?

-- 
Andrew (irc:RhodiumToad)

Re: Performance improvements for src/port/snprintf.c

2018-10-06 Thread Tom Lane

Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> Yeah, one would hope. But I wonder whether it always produces the
>  Tom> same low-order digits, and if not, whether people will complain.

> It won't produce the same low-order digits in general, since it has a
> different objective: rather than outputting a decimal value which is the
> true float value rounded to a fixed size by decimal rounding rules, it
> produces the shortest decimal value which falls within the binary float
> rounding interval of the true float value. i.e. the objective is to be
> able to round-trip back to float and get the identical result.

So I'm thinking that there are two, hopefully separable, issues here:

1. The algorithm for deciding how many digits to print.

2. The speed.

Now, "shortest value that converts back exactly" is technically cool,
but I am not sure that it solves any real-world problem that we have.
I'm also worried that introducing it would result in complaints like
https://www.postgresql.org/message-id/CANaXbVjw3Y8VmapWuZahtcRhpE61hsSUcjquip3HuXeuN8y4sg%40mail.gmail.com

As for #2, my *very* short once-over of the code led me to think that
the speed win comes mostly from use of wide integer arithmetic, and
maybe from throwing big lookup tables at the problem.  If so, it's very
likely possible that we could adopt those techniques without necessarily
buying into the shortest-exact rule for how many digits to print.

> One option would be to stick with snprintf if extra_float_digits is less
> than 0 (or less than or equal to 0 and make the default 1) and use ryu
> otherwise, so that the option to get rounded floats is still there.
> (Apparently some people do use negative values of extra_float_digits.)
> Unlike other format-changing GUCs, this one already exists and is
> already used by people who want more or less precision, including by
> pg_dump where rount-trip conversion is the requirement.

I wouldn't necessarily object to having some value of extra_float_digits
that selects the shortest-exact rule, but I'm thinking maybe it should
be a value we don't currently accept.

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-10-05 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 Tom> Thanks. Just scanning through the code quickly, I note that it
 Tom> assumes IEEE float format, which is probably okay but I suppose we
 Tom> might want a configure switch to disable it (and revert to
 Tom> platform sprintf).

Yeah, but even s390 these days supports IEEE floats in hardware so I'm
not sure there are any platforms left that don't (that we care about).

 Tom> I couldn't immediately figure out if it's got endianness
 Tom> assumptions; but even if it does, that'd likely only affect the
 Tom> initial disassembly of the IEEE format, so probably not a huge
 Tom> deal.

Upstream docs say it's fine with big-endian as long as the endianness of
ints and floats is the same.

 Tom> I wonder which variant of the code you were testing (e.g.
 Tom> HAS_UINT128 or not).

I was using clang 3.9.1 on FreeBSD amd64, and HAS_UINT128 ends up
enabled by this test:

#if defined(__SIZEOF_INT128__) && !defined(_MSC_VER) && 
!defined(RYU_ONLY_64_BIT_OPS)
#define HAS_UINT128
...

 >> The regression tests for float8 fail of course since Ryu's output
 >> format differs (it always includes an exponent, but the code for
 >> that part can be tweaked without touching the main algorithm).

 Tom> Yeah, one would hope. But I wonder whether it always produces the
 Tom> same low-order digits, and if not, whether people will complain.

It won't produce the same low-order digits in general, since it has a
different objective: rather than outputting a decimal value which is the
true float value rounded to a fixed size by decimal rounding rules, it
produces the shortest decimal value which falls within the binary float
rounding interval of the true float value. i.e. the objective is to be
able to round-trip back to float and get the identical result.

One option would be to stick with snprintf if extra_float_digits is less
than 0 (or less than or equal to 0 and make the default 1) and use ryu
otherwise, so that the option to get rounded floats is still there.
(Apparently some people do use negative values of extra_float_digits.)
Unlike other format-changing GUCs, this one already exists and is
already used by people who want more or less precision, including by
pg_dump where rount-trip conversion is the requirement.

Here are some examples of differences in digits, comparing ryu output
with extra_float_digits=3:

  Pi: ryu 3.141592653589793E0
  sprintf 3.14159265358979312
   e: ryu 2.7182818284590455E0
  sprintf 2.71828182845904553
1/10: ryu 1E-1
  sprintf 0.16
 1/3: ryu 3.333E-1
  sprintf 0.15
 2/3: ryu 6.666E-1
  sprintf 0.3

-- 
Andrew.

Re: Performance improvements for src/port/snprintf.c

2018-10-05 Thread Tom Lane

Andrew Gierth  writes:
>  Tom> Oh yeah?  Where's the code for this?

> Upstream code is at https://github.com/ulfjack/ryu
> ...
> I attach the patch I've used for testing, which has these changes from
> upstream Ryu:

Thanks.  Just scanning through the code quickly, I note that it assumes
IEEE float format, which is probably okay but I suppose we might want
a configure switch to disable it (and revert to platform sprintf).
I couldn't immediately figure out if it's got endianness assumptions;
but even if it does, that'd likely only affect the initial disassembly
of the IEEE format, so probably not a huge deal.

I wonder which variant of the code you were testing (e.g. HAS_UINT128
or not).

There's a pretty large gap between this code and PG coding conventions,
both as to layout and portability rules.  I wonder if we'd be better off
to implement the algorithm afresh instead of whacking this particular
code past the point of unrecognizability.

> The regression tests for
> float8 fail of course since Ryu's output format differs (it always
> includes an exponent, but the code for that part can be tweaked without
> touching the main algorithm).

Yeah, one would hope.  But I wonder whether it always produces the
same low-order digits, and if not, whether people will complain.
We just had somebody griping about a change in insignificant zeroes
in timestamp output :-(.  Still, seems worth further investigation.

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-10-05 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 >> Ryu is so blazing fast that with it, COPY of a table with 2million
 >> rows of 12 random float8 columns (plus id) becomes FASTER in text
 >> mode than in binary mode (rather than ~5x slower):

 Tom> Oh yeah?  Where's the code for this?

Upstream code is at https://github.com/ulfjack/ryu

Most of that is benchmarking, java, and other stuff not interesting to
us; the guts are under ryu/ and are dual-licensed under Boost 1.0 (which
I think we can use, since the only difference from BSD seems to be a
permissive one) as well as Apache 2.0 (which AFAIK we can't use).

I attach the patch I've used for testing, which has these changes from
upstream Ryu:

  - added ryu_ prefix to entry point functions
  - changed some #include file locations
  - added #define NDEBUG since there are a bunch of plain C assert()s

but I didn't touch the formatting or style of the Ryu code so it's all
C99 and // comments and OTB etc.

For testing purposes what I did was to change float[48]out to use the
Ryu code iff extra_float_digits > 0. This isn't likely what a final
version should do, just a convenience flag. The regression tests for
float8 fail of course since Ryu's output format differs (it always
includes an exponent, but the code for that part can be tweaked without
touching the main algorithm).

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index df35557b73..b7e44b2eba 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -21,6 +21,7 @@
 
 #include "catalog/pg_type.h"
 #include "common/int.h"
+#include "common/ryu.h"
 #include "libpq/pqformat.h"
 #include "utils/array.h"
 #include "utils/float.h"
@@ -245,6 +246,13 @@ float4out(PG_FUNCTION_ARGS)
 	float4		num = PG_GETARG_FLOAT4(0);
 	char	   *ascii;
 
+	if (extra_float_digits > 0)
+	{
+		ascii = (char *) palloc(24);
+		ryu_f2s_buffered(num, ascii);
+		PG_RETURN_CSTRING(ascii);
+	}
+
 	if (isnan(num))
 		PG_RETURN_CSTRING(pstrdup("NaN"));
 
@@ -481,6 +489,13 @@ float8out_internal(double num)
 {
 	char	   *ascii;
 
+	if (extra_float_digits > 0)
+	{
+		ascii = (char *) palloc(32);
+		ryu_d2s_buffered(num, ascii);
+		return ascii;
+	}
+
 	if (isnan(num))
 		return pstrdup("NaN");
 
diff --git a/src/common/Makefile b/src/common/Makefile
index ec8139f014..ae89dd8c5e 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -44,8 +44,8 @@ override CPPFLAGS += -DVAL_LIBS="\"$(LIBS)\""
 override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
 LIBS += $(PTHREAD_LIBS)
 
-OBJS_COMMON = base64.o config_info.o controldata_utils.o exec.o file_perm.o \
-	ip.o keywords.o link-canary.o md5.o pg_lzcompress.o \
+OBJS_COMMON = base64.o config_info.o controldata_utils.o d2s.o exec.o f2s.o \
+	file_perm.o ip.o keywords.o link-canary.o md5.o pg_lzcompress.o \
 	pgfnames.o psprintf.o relpath.o \
 	rmtree.o saslprep.o scram-common.o string.o unicode_norm.o \
 	username.o wait_error.o
diff --git a/src/common/d2s.c b/src/common/d2s.c
new file mode 100644
index 00..20fac52704
--- /dev/null
+++ b/src/common/d2s.c
@@ -0,0 +1,612 @@
+// Copyright 2018 Ulf Adams
+//
+// The contents of this file may be used under the terms of the Apache License,
+// Version 2.0.
+//
+//(See accompanying file LICENSE-Apache or copy at
+// http://www.apache.org/licenses/LICENSE-2.0)
+//
+// Alternatively, the contents of this file may be used under the terms of
+// the Boost Software License, Version 1.0.
+//(See accompanying file LICENSE-Boost or copy at
+// https://www.boost.org/LICENSE_1_0.txt)
+//
+// Unless required by applicable law or agreed to in writing, this software
+// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.
+
+// Runtime compiler options:
+// -DRYU_DEBUG Generate verbose debugging output to stdout.
+//
+// -DRYU_ONLY_64_BIT_OPS Avoid using uint128_t or 64-bit intrinsics. Slower,
+// depending on your compiler.
+//
+// -DRYU_OPTIMIZE_SIZE Use smaller lookup tables. Instead of storing every
+// required power of 5, only store every 26th entry, and compute
+// intermediate values with a multiplication. This reduces the lookup table
+// size by about 10x (only one case, and only double) at the cost of some
+// performance. Currently requires MSVC intrinsics.
+
+#define NDEBUG
+
+#include "common/ryu.h"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef RYU_DEBUG
+#include 
+#include 
+#endif
+
+// ABSL avoids uint128_t on Win32 even if __SIZEOF_INT128__ is defined.
+// Let's do the same for now.
+#if defined(__SIZEOF_INT128__) && !defined(_MSC_VER) && !defined(RYU_ONLY_64_BIT_OPS)
+#define HAS_UINT128
+#elif defined(_MSC_VER) && !defined(RYU_ONLY_64_BIT_OPS) && defined(_M_X64) \
+  && !defined(__clang__) // https://bugs.llvm.org/show_bug.cgi?id=37755
+#define HAS_64_BIT_INTRINSICS
+#endif
+
+#include "ryu_common.h"
+#include "digit_table.h"
+#include "d2s.h"
+#include

Re: Performance improvements for src/port/snprintf.c

2018-10-05 Thread Tom Lane

Andrew Gierth  writes:
> So here's a thing: I finally got to doing my performance tests for using
> the Ryu float output code in float[48]out.
> Ryu is so blazing fast that with it, COPY of a table with 2million rows
> of 12 random float8 columns (plus id) becomes FASTER in text mode than
> in binary mode (rather than ~5x slower):

Oh yeah?  Where's the code for this?

> (And yes, I've double-checked the results and they look correct, other
> than the formatting differences. COPY BINARY seems to have a bit more
> overhead than text mode, even for just doing integers, I don't know
> why.)

The per-column overhead is more (length word vs delimiter) and I think
the APIs for send/recv functions are potentially a bit less efficient
too.

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-10-05 Thread Andrew Gierth

> "Andres" == Andres Freund  writes:

 Andres> I'm not convinced. Because of some hypothetical platform that
 Andres> may introduce strfromd() in a broken/slower manner, but where
 Andres> sprintf() is correct, we should not do the minimal work to
 Andres> alleviate an actual performance bottleneck in a trivial manner
 Andres> on linux? Our most widely used platform? If we find a platform
 Andres> where it's borked, we could just add a small hack into their
 Andres> platform template file.

So here's a thing: I finally got to doing my performance tests for using
the Ryu float output code in float[48]out.

Ryu is so blazing fast that with it, COPY of a table with 2million rows
of 12 random float8 columns (plus id) becomes FASTER in text mode than
in binary mode (rather than ~5x slower):

copy binary flttst to '/dev/null';  -- binary
Time: 3222.444 ms (00:03.222)

copy flttst to '/dev/null';  -- non-Ryu
Time: 16416.161 ms (00:16.416)

copy flttst to '/dev/null';  -- Ryu
Time: 2691.642 ms (00:02.692)

(And yes, I've double-checked the results and they look correct, other
than the formatting differences. COPY BINARY seems to have a bit more
overhead than text mode, even for just doing integers, I don't know
why.)

-- 
Andrew (irc:RhodiumToad)

Re: Performance improvements for src/port/snprintf.c

2018-10-05 Thread Tom Lane

I stepped back a bit from the raw performance question and thought about
what we actually want functionally in snprintf's float handling.  There
are a couple of points worth making:

* The fact that float[48]out explicitly handle NaN and Inf cases is a
leftover from when they had to cope with varying behavior of
platform-specific snprintf implementations.  Now that we've standardized
on snprintf.c, it makes a lot more sense to enforce standardized printing
of these values inside snprintf.c.  That not only avoids repeated tests
for these cases at different code levels, but ensures that the uniform
behavior exists for all our invocations of *printf, not just float[48]out.

* snprintf.c doesn't really work right for IEEE minus zero, as I recently
noted in another thread (<23662.1538067...@sss.pgh.pa.us>).  While this
is not of significance for float[48]out, it might be a problem for other
callers.  Now that we've enforced usage of snprintf.c across-the-board,
I think it's more important to worry about these corner cases.  It's not
that expensive to fix either; we can test for minus zero with something
like this:
static const double dzero = 0.0;
if (value == 0.0 &&
memcmp(, , sizeof(double)) != 0)
(ie, "it's equal to zero but not bitwise equal to zero").  While that
looks like it might be expensive, I find that recent versions of both
gcc and clang can optimize the memcmp call down to something like
cmpq$0, 8(%rsp)
so I think it's well worth the cost to get this right.

The attached proposed patch addresses both of those points.

Also, in service of the performance angle, I went ahead and made a
roughly strfromd-like entry point in snprintf.c, but using an API
that doesn't force textual conversion of the precision spec.

As best I can tell, this patch puts the performance of float8out
on par with what it was in v11, measuring using a tight loop like

while (count-- > 0)
{
char *str = float8out_internal(val);
pfree(str);
CHECK_FOR_INTERRUPTS();
}

For me, this is within a percent or two either way on a couple of
different machines; that's within the noise level.

regards, tom lane

diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index df35557..260377c 100644
*** a/src/backend/utils/adt/float.c
--- b/src/backend/utils/adt/float.c
*** Datum
*** 243,272 
  float4out(PG_FUNCTION_ARGS)
  {
  	float4		num = PG_GETARG_FLOAT4(0);
! 	char	   *ascii;
! 
! 	if (isnan(num))
! 		PG_RETURN_CSTRING(pstrdup("NaN"));
! 
! 	switch (is_infinite(num))
! 	{
! 		case 1:
! 			ascii = pstrdup("Infinity");
! 			break;
! 		case -1:
! 			ascii = pstrdup("-Infinity");
! 			break;
! 		default:
! 			{
! int			ndig = FLT_DIG + extra_float_digits;
! 
! if (ndig < 1)
! 	ndig = 1;
! 
! ascii = psprintf("%.*g", ndig, num);
! 			}
! 	}
  
  	PG_RETURN_CSTRING(ascii);
  }
  
--- 243,252 
  float4out(PG_FUNCTION_ARGS)
  {
  	float4		num = PG_GETARG_FLOAT4(0);
! 	char	   *ascii = (char *) palloc(32);
! 	int			ndig = FLT_DIG + extra_float_digits;
  
+ 	(void) pg_strfromd(ascii, 32, ndig, num);
  	PG_RETURN_CSTRING(ascii);
  }
  
*** float8out(PG_FUNCTION_ARGS)
*** 479,508 
  char *
  float8out_internal(double num)
  {
! 	char	   *ascii;
! 
! 	if (isnan(num))
! 		return pstrdup("NaN");
! 
! 	switch (is_infinite(num))
! 	{
! 		case 1:
! 			ascii = pstrdup("Infinity");
! 			break;
! 		case -1:
! 			ascii = pstrdup("-Infinity");
! 			break;
! 		default:
! 			{
! int			ndig = DBL_DIG + extra_float_digits;
! 
! if (ndig < 1)
! 	ndig = 1;
! 
! ascii = psprintf("%.*g", ndig, num);
! 			}
! 	}
  
  	return ascii;
  }
  
--- 459,468 
  char *
  float8out_internal(double num)
  {
! 	char	   *ascii = (char *) palloc(32);
! 	int			ndig = DBL_DIG + extra_float_digits;
  
+ 	(void) pg_strfromd(ascii, 32, ndig, num);
  	return ascii;
  }
  
diff --git a/src/include/port.h b/src/include/port.h
index e654d5c..0729c3f 100644
*** a/src/include/port.h
--- b/src/include/port.h
*** extern int	pg_printf(const char *fmt,...
*** 187,192 
--- 187,195 
  #define fprintf			pg_fprintf
  #define printf(...)		pg_printf(__VA_ARGS__)
  
+ /* This is also provided by snprintf.c */
+ extern int	pg_strfromd(char *str, size_t count, int precision, double value);
+ 
  /* Replace strerror() with our own, somewhat more robust wrapper */
  extern char *pg_strerror(int errnum);
  #define strerror pg_strerror
diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index ef496fa..897c683 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*** fmtfloat(double value, char type, int fo
*** ,1120 
  	int			zeropadlen = 0; /* amount to pad with zeroes */
  	int			padlen;			/* amount to pad with spaces */
  
- 	/* Handle sign (NaNs have no sign) */
- 	if (!isnan(value) && adjust_sign((value <

Re: Performance improvements for src/port/snprintf.c

2018-10-05 Thread Tom Lane

Andres Freund  writes:
> On 2018-10-05 11:54:59 -0400, Tom Lane wrote:
>> I really think that what we ought to do is apply the float[48]out hack
>> I showed in <30551.1538517...@sss.pgh.pa.us> and call it good, at least
>> till such time as somebody wants to propose a full-on reimplementation of
>> float output.  I don't want to buy back into having platform dependencies
>> in this area after having just expended a lot of sweat to get rid of them.

> I'm not convinced. Because of some hypothetical platform that may
> introduce strfromd() in a broken/slower manner, but where sprintf() is
> correct, we should not do the minimal work to alleviate an actual
> performance bottleneck in a trivial manner on linux? Our most widely
> used platform?  If we find a platform where it's borked, we could just
> add a small hack into their platform template file.

If it were a significant performance improvement, I'd be okay with that
conclusion, but my measurements say that it's not.  The extra complication
is not free, and in my judgement it's not worth it.

We certainly do need to buy back the performance we lost in float[48]out,
but the hack I suggested does so --- on all platforms, not only Linux.

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-10-05 Thread Andres Freund

Hi,

On 2018-10-05 11:54:59 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > [ let's use strfromd ]
> 
> So I'm having second thoughts about this, based on the fact that
> strfromd() in't strictly a glibc-ism but is defined in an ISO/IEC
> standard.  That means that we can expect to see it start showing up
> on other platforms (though a quick search did not find any evidence
> that it has yet).  And that means that we'd better consider
> quality-of-implementation issues.  We know that glibc's version is
> fractionally faster than using sprintf with "%.*g", but what are
> the odds that that will be true universally?  I don't have a warm
> feeling about it, given that strfromd's API isn't a very good impedance
> match to what we really need.
> 
> I really think that what we ought to do is apply the float[48]out hack
> I showed in <30551.1538517...@sss.pgh.pa.us> and call it good, at least
> till such time as somebody wants to propose a full-on reimplementation of
> float output.  I don't want to buy back into having platform dependencies
> in this area after having just expended a lot of sweat to get rid of them.

I'm not convinced. Because of some hypothetical platform that may
introduce strfromd() in a broken/slower manner, but where sprintf() is
correct, we should not do the minimal work to alleviate an actual
performance bottleneck in a trivial manner on linux? Our most widely
used platform?  If we find a platform where it's borked, we could just
add a small hack into their platform template file.

Greetings,

Andres Freund

Re: Performance improvements for src/port/snprintf.c

2018-10-05 Thread Tom Lane

Andres Freund  writes:
> [ let's use strfromd ]

So I'm having second thoughts about this, based on the fact that
strfromd() in't strictly a glibc-ism but is defined in an ISO/IEC
standard.  That means that we can expect to see it start showing up
on other platforms (though a quick search did not find any evidence
that it has yet).  And that means that we'd better consider
quality-of-implementation issues.  We know that glibc's version is
fractionally faster than using sprintf with "%.*g", but what are
the odds that that will be true universally?  I don't have a warm
feeling about it, given that strfromd's API isn't a very good impedance
match to what we really need.

I really think that what we ought to do is apply the float[48]out hack
I showed in <30551.1538517...@sss.pgh.pa.us> and call it good, at least
till such time as somebody wants to propose a full-on reimplementation of
float output.  I don't want to buy back into having platform dependencies
in this area after having just expended a lot of sweat to get rid of them.

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Tom Lane

Andres Freund  writes:
> On 2018-10-03 14:01:35 -0400, Tom Lane wrote:
>> BTW, so far as I can tell on F28, strfromd isn't exposed without
>> "-D__STDC_WANT_IEC_60559_BFP_EXT__", which seems fairly scary;
>> what else does that affect?

> So I don't think anything's needed to enable that in pg, given that we
> define _GNU_SOURCE

Ah, OK.  I thought my test case had _GNU_SOURCE defined already,
but it didn't.  You might want to do something like what I stuck
in for strchrnul, though:

/*
 * glibc's  declares strchrnul only if _GNU_SOURCE is defined.
 * While we typically use that on glibc platforms, configure will set
 * HAVE_STRCHRNUL whether it's used or not.  Fill in the missing declaration
 * so that this file will compile cleanly with or without _GNU_SOURCE.
 */
#ifndef _GNU_SOURCE
extern char *strchrnul(const char *s, int c);
#endif

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Tom Lane

I wrote:
>> Hm. I guess that'd be a bit better, but I'm not sure it's worth it. How
>> about we simply add a static assert that long long isn't bigger than
>> int64?

> WFM, I'll make it happen.

Actually, while writing a comment to go with that assertion, I decided
this was dumb.  If we're expecting the compiler to have "long long",
and if we're convinced that no platforms define "long long" as wider
than 64 bits, we may as well go with the s/int64/long long/g solution.
That should result in no code change on any platform today.  And it
will still work correctly, if maybe a bit inefficiently, on some
hypothetical future platform where long long is wider.  We (or our
successors) can worry about optimizing that when the time comes.

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Andres Freund

Hi,

On 2018-10-03 14:01:35 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > So when using pg's snprintf() to print a single floating point number
> > with precision, we get nearly a 10% boost.
> 
> I just tested that using my little standalone testbed, and I failed
> to replicate the result.  I do see that strfromd is slightly faster,
> but it's just a few percent measuring snprintf.c in isolation --- in
> the overall context of COPY, I don't see how you get to 10% net savings.

I just tested your patch, and I see (best of three):

master:
16224.727 ms
hack-use-of-strfromd.patch:
14944.927 ms

So not quite 10%, but pretty close.


CREATE TABLE somefloats(id serial, data1 float8, data2 float8, data3 float8);
INSERT INTO somefloats(data1, data2, data3) SELECT random(), random(), random() 
FROM generate_series(1, 1000);
VACUUM FREEZE somefloats;

COPY somefloats TO '/dev/null';

What difference do you see?


> So I continue to think there's something fishy about your test case.
> 
> BTW, so far as I can tell on F28, strfromd isn't exposed without
> "-D__STDC_WANT_IEC_60559_BFP_EXT__", which seems fairly scary;
> what else does that affect?

My copy says:

#undef __GLIBC_USE_IEC_60559_BFP_EXT
#if defined __USE_GNU || defined __STDC_WANT_IEC_60559_BFP_EXT__
# define __GLIBC_USE_IEC_60559_BFP_EXT 1
#else
# define __GLIBC_USE_IEC_60559_BFP_EXT 0
#endif

And __USE_GNU is enabled by
#ifdef  _GNU_SOURCE
# define __USE_GNU  1
#endif

So I don't think anything's needed to enable that in pg, given that we
define _GNU_SOURCE


Greetings,

Andres Freund

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Alvaro Herrera

On 2018-Oct-03, Tom Lane wrote:

> Andres Freund  writes:

> BTW, so far as I can tell on F28, strfromd isn't exposed without
> "-D__STDC_WANT_IEC_60559_BFP_EXT__", which seems fairly scary;
> what else does that affect?

https://en.cppreference.com/w/c/experimental/fpext1

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Tom Lane

Andres Freund  writes:
> So when using pg's snprintf() to print a single floating point number
> with precision, we get nearly a 10% boost.

I just tested that using my little standalone testbed, and I failed
to replicate the result.  I do see that strfromd is slightly faster,
but it's just a few percent measuring snprintf.c in isolation --- in
the overall context of COPY, I don't see how you get to 10% net savings.

So I continue to think there's something fishy about your test case.

BTW, so far as I can tell on F28, strfromd isn't exposed without
"-D__STDC_WANT_IEC_60559_BFP_EXT__", which seems fairly scary;
what else does that affect?

regards, tom lane

diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index b9b6add..f75369c 100644
--- a/src/port/snprintf.c
+++ b/src/port/snprintf.c
@@ -1137,17 +1137,19 @@ fmtfloat(double value, char type, int forcesign, int leftjust,
 		zeropadlen = precision - prec;
 		fmt[0] = '%';
 		fmt[1] = '.';
-		fmt[2] = '*';
-		fmt[3] = type;
-		fmt[4] = '\0';
-		vallen = sprintf(convert, fmt, prec, value);
+		fmt[2] = (prec / 100) + '0';
+		fmt[3] = ((prec % 100) / 10) + '0';
+		fmt[4] = (prec % 10) + '0';
+		fmt[5] = type;
+		fmt[6] = '\0';
+		vallen = strfromd(convert, sizeof(convert), fmt, value);
 	}
 	else
 	{
 		fmt[0] = '%';
 		fmt[1] = type;
 		fmt[2] = '\0';
-		vallen = sprintf(convert, fmt, value);
+		vallen = strfromd(convert, sizeof(convert), fmt, value);
 	}
 	if (vallen < 0)
 		goto fail;

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Tom Lane

Andres Freund  writes:
> On 2018-10-03 13:18:35 -0400, Tom Lane wrote:
>> Having said that, maybe there's a case for changing the type spec
>> in only the va_arg() call, and leaving snprintf's related local
>> variables as int64.  (Is that what you actually meant?)  Then,
>> if long long really is different from int64, at least we have
>> predictable truncation behavior after fetching the value, rather
>> than undefined behavior while fetching it.

> Hm. I guess that'd be a bit better, but I'm not sure it's worth it. How
> about we simply add a static assert that long long isn't bigger than
> int64?

WFM, I'll make it happen.

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Andres Freund

On 2018-10-03 13:40:03 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-10-03 13:31:09 -0400, Tom Lane wrote:
> >> I do not see the point of messing with snprintf.c here.  I doubt that
> >> strfromd is faster than the existing sprintf call (because the latter
> >> can use ".*" instead of serializing and deserializing the precision).
>
> > I'm confused, the numbers I posted clearly show that it's faster?
>
> Those were in the context of whether float8out went through snprintf.c
> or directly to strfromd, no?

I measured both, changing float8out directly, and just adapting
snprintf.c:

> snprintf using sprintf via pg_double_to_string:
> 16195.787
>
> snprintf using strfromd via pg_double_to_string:
> 14856.974 ms
>
> float8out using sprintf via pg_double_to_string:
> 16176.169
>
> float8out using strfromd via pg_double_to_string:
> 13532.698

So when using pg's snprintf() to print a single floating point number
with precision, we get nearly a 10% boost. The win unsurprisingly is
bigger when not going through snprintf.c.

Greetings,

Andres Freund

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Andres Freund

Hi,

On 2018-10-03 13:18:35 -0400, Tom Lane wrote:
> I wrote:
> > Andres Freund  writes:
> >> - I know it's not new, but is it actually correct to use va_arg(args, 
> >> int64)
> >> for ATYPE_LONGLONG?
> 
> > Well, the problem with just doing s/int64/long long/g is that the
> > code would then fail on compilers without a "long long" type.
> > We could ifdef our way around that, but I don't think the code would
> > end up prettier.
> 
> I spent a bit more time thinking about that point.  My complaint about
> lack of long long should be moot given that we're now requiring C99.

True, I didn't think of that.


> So the two cases we need to worry about are (1) long long exists and
> is 64 bits, and (2) long long exists and is wider than 64 bits.  In
> case (1) there's nothing actively wrong with the code as it stands.
> In case (2), if we were to fix the problem by s/int64/long long/g,
> the result would be that we'd be doing the arithmetic for all
> integer-to-text conversions in 128 bits, which seems likely to be
> pretty expensive.

Yea, that seems quite undesirable.


> So a "real" fix would probably require having separate versions of
> fmtint for long and long long.  I'm not terribly excited about
> going there.  I can see it happening some day when/if we need to
> use 128-bit math more extensively than today, but I do not think
> that day is close.

Right, that seems a bit off.


> (Are there *any* platforms where "long long" is 128 bits today?)

Not that I'm aware off.


> Having said that, maybe there's a case for changing the type spec
> in only the va_arg() call, and leaving snprintf's related local
> variables as int64.  (Is that what you actually meant?)  Then,
> if long long really is different from int64, at least we have
> predictable truncation behavior after fetching the value, rather
> than undefined behavior while fetching it.

Hm. I guess that'd be a bit better, but I'm not sure it's worth it. How
about we simply add a static assert that long long isn't bigger than
int64?

Greetings,

Andres Freund

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Tom Lane

Andres Freund  writes:
> On 2018-10-03 13:31:09 -0400, Tom Lane wrote:
>> I do not see the point of messing with snprintf.c here.  I doubt that
>> strfromd is faster than the existing sprintf call (because the latter
>> can use ".*" instead of serializing and deserializing the precision).

> I'm confused, the numbers I posted clearly show that it's faster?

Those were in the context of whether float8out went through snprintf.c
or directly to strfromd, no?

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Andres Freund

Hi,

On 2018-10-03 13:31:09 -0400, Tom Lane wrote:
> I do not see the point of messing with snprintf.c here.  I doubt that
> strfromd is faster than the existing sprintf call (because the latter
> can use ".*" instead of serializing and deserializing the precision).

I'm confused, the numbers I posted clearly show that it's faster?

Greetings,

Andres Freund

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Tom Lane

Andres Freund  writes:
> On 2018-10-03 12:54:52 -0400, Tom Lane wrote:
>> (1) The need to use sprintf for portability means that we need very
>> tight constraints on the precision spec *and* the buffer size *and*
>> the format type (%f pretty much destroys certainty about how long the
>> output string is).  So this isn't going to be general purpose code.
>> I think just writing it into float[48]out is sufficient.

> Well, the numbers suggest it's also useful to do so from snprintf - it's
> not that rare that we output floating point numbers from semi
> performance critical code, even leaving aside float[48]out.  So I'm not
> convinced that we shouldn't do this from within snprintf.c too. Now we
> could open-code it twice, but i'm not sure I see the point.

I do not see the point of messing with snprintf.c here.  I doubt that
strfromd is faster than the existing sprintf call (because the latter
can use ".*" instead of serializing and deserializing the precision).
Even if it is, I do not want to expose an attractive-nuisance API
in a header, and I think this would be exactly that.

> If we just define the API as having to guarantee there's enough space
> for the output format, I think it'll work well enough for now?

No, because that's a recipe for buffer-overflow bugs.  It's *hard*
to be sure the buffer is big enough, and easy to make breakable
assumptions.

> snprintf.c already assumes everything floating point can be output in
> 1024 chars, no?

Indeed, and it's got hacks like a forced limit to precision 350 in order
to make that safe.  I don't want to be repeating the reasoning in
fmtfloat() in a bunch of other places.

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Tom Lane

I wrote:
> Andres Freund  writes:
>> - I know it's not new, but is it actually correct to use va_arg(args, int64)
>> for ATYPE_LONGLONG?

> Well, the problem with just doing s/int64/long long/g is that the
> code would then fail on compilers without a "long long" type.
> We could ifdef our way around that, but I don't think the code would
> end up prettier.

I spent a bit more time thinking about that point.  My complaint about
lack of long long should be moot given that we're now requiring C99.
So the two cases we need to worry about are (1) long long exists and
is 64 bits, and (2) long long exists and is wider than 64 bits.  In
case (1) there's nothing actively wrong with the code as it stands.
In case (2), if we were to fix the problem by s/int64/long long/g,
the result would be that we'd be doing the arithmetic for all
integer-to-text conversions in 128 bits, which seems likely to be
pretty expensive.

So a "real" fix would probably require having separate versions of
fmtint for long and long long.  I'm not terribly excited about
going there.  I can see it happening some day when/if we need to
use 128-bit math more extensively than today, but I do not think
that day is close.  (Are there *any* platforms where "long long"
is 128 bits today?)

Having said that, maybe there's a case for changing the type spec
in only the va_arg() call, and leaving snprintf's related local
variables as int64.  (Is that what you actually meant?)  Then,
if long long really is different from int64, at least we have
predictable truncation behavior after fetching the value, rather
than undefined behavior while fetching it.

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Andres Freund

Hi,

On 2018-10-03 12:54:52 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > It seems the general "use strfromd if available" approach is generally
> > useful, even if we need to serialize the precision.
> 
> Agreed.
> 
> > Putting it into an
> > inline appears to be helpful, avoids some of the otherwise precision
> > related branches.  Do you have any feelings about which header to put
> > the code in?  I used common/string.h so far.
> 
> I do not think it should be in a header, for two reasons:
> 
> (1) The need to use sprintf for portability means that we need very
> tight constraints on the precision spec *and* the buffer size *and*
> the format type (%f pretty much destroys certainty about how long the
> output string is).  So this isn't going to be general purpose code.
> I think just writing it into float[48]out is sufficient.

Well, the numbers suggest it's also useful to do so from snprintf - it's
not that rare that we output floating point numbers from semi
performance critical code, even leaving aside float[48]out.  So I'm not
convinced that we shouldn't do this from within snprintf.c too. Now we
could open-code it twice, but i'm not sure I see the point.

If we just define the API as having to guarantee there's enough space
for the output format, I think it'll work well enough for now?
snprintf.c already assumes everything floating point can be output in
1024 chars, no?

Greetings,

Andres Freund

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Tom Lane

Andres Freund  writes:
> It seems the general "use strfromd if available" approach is generally
> useful, even if we need to serialize the precision.

Agreed.

> Putting it into an
> inline appears to be helpful, avoids some of the otherwise precision
> related branches.  Do you have any feelings about which header to put
> the code in?  I used common/string.h so far.

I do not think it should be in a header, for two reasons:

(1) The need to use sprintf for portability means that we need very
tight constraints on the precision spec *and* the buffer size *and*
the format type (%f pretty much destroys certainty about how long the
output string is).  So this isn't going to be general purpose code.
I think just writing it into float[48]out is sufficient.

(2) It's already the case that most code trying to emit floats ought
to go through float[48]out, in order to have standardized treatment
of Inf and NaN.  Providing some other API in a common header would
just create a temptation to break that policy.

Now, if we did write our own float output code then we would standardize
Inf/NaN outputs inside that, and both of these issues would go away ...
but I think what we'd do is provide something strfromd-like as an
alternate API for that code, so we still won't need a wrapper.
And anyway it doesn't sound like either of us care to jump that hurdle
right now.

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Tom Lane

Andres Freund  writes:
> On 2018-10-03 11:59:27 -0400, Tom Lane wrote:
>> I experimented with adding an initial check for "format is exactly %s"
>> at the top of dopr(), and couldn't get excited about that.  Instrumenting
>> things showed that the optimization fired in only 1.8% of the calls
>> during a run of our core regression tests.  Now, that might not count
>> as a really representative workload, but it doesn't make me think that
>> the case is worth optimizing for us.

> Seems right.  I also have a hard time to believe that any of those "%s"
> printfs are performance critical - we'd hopefully just have avoided the
> sprintf in that case.

Yup, that's probably a good chunk of the reason why there aren't very
many.  But we *do* use %s a lot to assemble multiple strings or combine
them with fixed text, which is why the other form of the optimization
is useful.

>> But then it occurred to me that there's more than one way to skin this
>> cat.  We could, for an even cheaper extra test, detect that any one
>> format specifier is just "%s", and use the same kind of fast-path
>> within the loop.  With the same sort of instrumentation, I found that
>> a full 45% of the format specs executed in the core regression tests
>> are just %s.  That makes me think that a patch along the lines of the
>> attached is a good win for our use-cases.  Comparing to Fedora 28's
>> glibc, this gets us to

> Hm, especially if we special case the float->string conversions directly
> at the hot callsites, that seems reasonable.  I kinda wish we could just
> easily move the format string processing to compile-time, but given
> translatability that won't be widely possible even if it were otherwise
> feasible.

Yeah, there's a mighty big pile of infrastructure that depends on the
way *printf works.  I agree that one way or another we're going to be
special-casing float8out and float4out.

BTW, I poked around in the related glibc sources the other day, and
it seemed like they are doing some sort of quasi-compilation of format
strings.  I couldn't figure out how they made it pay, though --- without
some way to avoid re-compiling the same format string over and over,
seems like it couldn't net out as a win.  But if they are avoiding
that, I didn't find where.

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Andres Freund

Hi,

On 2018-10-03 12:22:13 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-10-03 12:07:32 -0400, Tom Lane wrote:
> >> [ scratches head ... ]  How would that work?  Seems like it necessarily
> >> adds a strlen() call to whatever we'd be doing otherwise.  palloc isn't
> >> going to be any faster just from asking it for slightly fewer bytes.
> >> I think there might be something wrong with your test scenario ...
> >> or there's more noise in the numbers than you thought.
> 
> > I guess the difference is that we're more likely to find reusable chunks
> > in aset.c and/or need fewer OS allocations.  As the memory is going to
> > be touched again very shortly afterwards, the cache effects probably are
> > neglegible.
> 
> > The strlen definitely shows up in profiles, it just seems to save at
> > least as much as it costs.
> 
> > Doesn't strike me as THAT odd?
> 
> What it strikes me as is excessively dependent on one particular test
> scenario.  I don't mind optimizations that are tradeoffs between
> well-understood costs, but this smells like handwaving that's going to
> lose as much or more often than winning, once it hits the real world.

I'm not particularly wedded to doing the allocation differently - I was
just mildly wondering if the increased size of the allocations could be
problematic. And that lead me to testing that. And reporting it.   I
don't think the real-world test differences are that large in this
specific case, but whatever.

It seems the general "use strfromd if available" approach is generally
useful, even if we need to serialize the precision.  Putting it into an
inline appears to be helpful, avoids some of the otherwise precision
related branches.  Do you have any feelings about which header to put
the code in?  I used common/string.h so far.

Greetings,

Andres Freund

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Andres Freund

Hi,

On 2018-10-03 11:59:27 -0400, Tom Lane wrote:
> I wrote:
> > ... However, I did add recent glibc (Fedora 28)
> > to the mix, and I was interested to discover that they seem to have
> > added a fast-path for format strings that are exactly "%s", just as
> > NetBSD did.  I wonder if we should reconsider our position on doing
> > that.  It'd be a simple enough addition...
> 
> I experimented with adding an initial check for "format is exactly %s"
> at the top of dopr(), and couldn't get excited about that.  Instrumenting
> things showed that the optimization fired in only 1.8% of the calls
> during a run of our core regression tests.  Now, that might not count
> as a really representative workload, but it doesn't make me think that
> the case is worth optimizing for us.

Seems right.  I also have a hard time to believe that any of those "%s"
printfs are performance critical - we'd hopefully just have avoided the
sprintf in that case.


> But then it occurred to me that there's more than one way to skin this
> cat.  We could, for an even cheaper extra test, detect that any one
> format specifier is just "%s", and use the same kind of fast-path
> within the loop.  With the same sort of instrumentation, I found that
> a full 45% of the format specs executed in the core regression tests
> are just %s.  That makes me think that a patch along the lines of the
> attached is a good win for our use-cases.  Comparing to Fedora 28's
> glibc, this gets us to

Hm, especially if we special case the float->string conversions directly
at the hot callsites, that seems reasonable.  I kinda wish we could just
easily move the format string processing to compile-time, but given
translatability that won't be widely possible even if it were otherwise
feasible.

Greetings,

Andres Freund

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Tom Lane

Andres Freund  writes:
> On 2018-10-03 12:07:32 -0400, Tom Lane wrote:
>> [ scratches head ... ]  How would that work?  Seems like it necessarily
>> adds a strlen() call to whatever we'd be doing otherwise.  palloc isn't
>> going to be any faster just from asking it for slightly fewer bytes.
>> I think there might be something wrong with your test scenario ...
>> or there's more noise in the numbers than you thought.

> I guess the difference is that we're more likely to find reusable chunks
> in aset.c and/or need fewer OS allocations.  As the memory is going to
> be touched again very shortly afterwards, the cache effects probably are
> neglegible.

> The strlen definitely shows up in profiles, it just seems to save at
> least as much as it costs.

> Doesn't strike me as THAT odd?

What it strikes me as is excessively dependent on one particular test
scenario.  I don't mind optimizations that are tradeoffs between
well-understood costs, but this smells like handwaving that's going to
lose as much or more often than winning, once it hits the real world.

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Tom Lane

Andres Freund  writes:
> FWIW, it seems that using a local buffer and than pstrdup'ing that in
> float8out_internal is a bit faster, and would probably save a bit of
> memory on average:
> float8out using sprintf via pg_double_to_string, pstrdup:
> 15370.774
> float8out using strfromd via pg_double_to_string, pstrdup:
> 13498.331

[ scratches head ... ]  How would that work?  Seems like it necessarily
adds a strlen() call to whatever we'd be doing otherwise.  palloc isn't
going to be any faster just from asking it for slightly fewer bytes.
I think there might be something wrong with your test scenario ...
or there's more noise in the numbers than you thought.

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Tom Lane

I wrote:
> ... However, I did add recent glibc (Fedora 28)
> to the mix, and I was interested to discover that they seem to have
> added a fast-path for format strings that are exactly "%s", just as
> NetBSD did.  I wonder if we should reconsider our position on doing
> that.  It'd be a simple enough addition...

I experimented with adding an initial check for "format is exactly %s"
at the top of dopr(), and couldn't get excited about that.  Instrumenting
things showed that the optimization fired in only 1.8% of the calls
during a run of our core regression tests.  Now, that might not count
as a really representative workload, but it doesn't make me think that
the case is worth optimizing for us.

But then it occurred to me that there's more than one way to skin this
cat.  We could, for an even cheaper extra test, detect that any one
format specifier is just "%s", and use the same kind of fast-path
within the loop.  With the same sort of instrumentation, I found that
a full 45% of the format specs executed in the core regression tests
are just %s.  That makes me think that a patch along the lines of the
attached is a good win for our use-cases.  Comparing to Fedora 28's
glibc, this gets us to

Test case: %s
snprintf time = 8.83615 ms total, 8.83615e-06 ms per iteration
pg_snprintf time = 23.9372 ms total, 2.39372e-05 ms per iteration
ratio = 2.709

Test case: %sx
snprintf time = 59.4481 ms total, 5.94481e-05 ms per iteration
pg_snprintf time = 29.8983 ms total, 2.98983e-05 ms per iteration
ratio = 0.503

versus what we have as of this morning's commit:

Test case: %s
snprintf time = 7.7427 ms total, 7.7427e-06 ms per iteration
pg_snprintf time = 26.2439 ms total, 2.62439e-05 ms per iteration
ratio = 3.390

Test case: %sx
snprintf time = 61.4452 ms total, 6.14452e-05 ms per iteration
pg_snprintf time = 32.7516 ms total, 3.27516e-05 ms per iteration
ratio = 0.533

The penalty for non-%s cases seems to be a percent or so, although
it's barely above the noise floor in my tests.

regards, tom lane

diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index cad7345..b9b6add 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*** dopr(PrintfTarget *target, const char *f
*** 431,436 
--- 431,449 

  		/* Process conversion spec starting at *format */
  		format++;
+ 
+ 		/* Fast path for conversion spec that is exactly %s */
+ 		if (*format == 's')
+ 		{
+ 			format++;
+ 			strvalue = va_arg(args, char *);
+ 			Assert(strvalue != NULL);
+ 			dostr(strvalue, strlen(strvalue), target);
+ 			if (target->failed)
+ break;
+ 			continue;
+ 		}
+ 
  		fieldwidth = precision = zpad = leftjust = forcesign = 0;
  		longflag = longlongflag = pointflag = 0;
  		fmtpos = accum = 0;

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Andres Freund

Hi,

On 2018-10-03 08:20:14 -0400, Tom Lane wrote:
> Andres Freund  writes:
> >> While there might be value in implementing our own float printing code,
> >> I have a pretty hard time getting excited about the cost/benefit ratio
> >> of that.  I think that what we probably really ought to do here is hack
> >> float4out/float8out to bypass the extra overhead, as in the 0002 patch
> >> below.
> 
> > I'm thinking we should do a bit more than just that hack. I'm thinking
> > of something (barely tested) like
> 
> Meh.  The trouble with that is that it relies on the platform's snprintf,
> not sprintf, and that brings us right back into a world of portability
> hurt.  I don't feel that the move to C99 gets us out of worrying about
> noncompliant snprintfs --- we're only requiring a C99 *compiler*, not
> libc.  See buildfarm member gharial for a counterexample.

Oh, we could just use sprintf() and tell strfromd the buffer is large
enough. I only used snprintf because it seemed more symmetric, and
because I was at most 1/3 awake.


> I'm happy to look into whether using strfromd when available buys us
> anything over using sprintf.  I'm not entirely convinced that it will,
> because of the need to ASCII-ize and de-ASCII-ize the precision, but
> it's worth checking.

It's definitely faster.  It's not a full-blown format parser, so I guess
the cost of the conversion isn't too bad:
https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strfrom-skeleton.c;hb=HEAD#l68

CREATE TABLE somefloats(id serial, data1 float8, data2 float8, data3 float8);
INSERT INTO somefloats(data1, data2, data3) SELECT random(), random(), random() 
FROM generate_series(1, 1000);
VACUUM FREEZE somefloats;

I'm comparing the times of:
COPY somefloats TO '/dev/null';

master (including your commit):
16177.202 ms

snprintf using sprintf via pg_double_to_string:
16195.787

snprintf using strfromd via pg_double_to_string:
14856.974 ms

float8out using sprintf via pg_double_to_string:
16176.169

float8out using strfromd via pg_double_to_string:
13532.698



FWIW, it seems that using a local buffer and than pstrdup'ing that in
float8out_internal is a bit faster, and would probably save a bit of
memory on average:

float8out using sprintf via pg_double_to_string, pstrdup:
15370.774

float8out using strfromd via pg_double_to_string, pstrdup:
13498.331


Greetings,

Andres Freund

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Tom Lane

Andres Freund  writes:
> On 2018-10-02 17:54:31 -0400, Tom Lane wrote:
>> Here's a version of this patch rebased over commit 625b38ea0.

> Cool.  Let's get that in...

Cool, I'll push it shortly.

>> While there might be value in implementing our own float printing code,
>> I have a pretty hard time getting excited about the cost/benefit ratio
>> of that.  I think that what we probably really ought to do here is hack
>> float4out/float8out to bypass the extra overhead, as in the 0002 patch
>> below.

> I'm thinking we should do a bit more than just that hack. I'm thinking
> of something (barely tested) like

Meh.  The trouble with that is that it relies on the platform's snprintf,
not sprintf, and that brings us right back into a world of portability
hurt.  I don't feel that the move to C99 gets us out of worrying about
noncompliant snprintfs --- we're only requiring a C99 *compiler*, not
libc.  See buildfarm member gharial for a counterexample.

I'm happy to look into whether using strfromd when available buys us
anything over using sprintf.  I'm not entirely convinced that it will,
because of the need to ASCII-ize and de-ASCII-ize the precision, but
it's worth checking.

> FWIW, I think there's still a significant argument to be made that we
> should work on our floating point IO performance. Both on the input and
> output side. It's a significant practical problem. But both a fix like
> you describe, and my proposal, should bring us to at least the previous
> level of performance for the hot paths. So that'd then just be an
> independent consideration.

Well, an independent project anyway.  I concur that it would have value;
but whether it's worth the effort, and the possible behavioral changes,
is not very clear to me.

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-10-03 Thread Andres Freund

On 2018-10-02 17:54:31 -0400, Tom Lane wrote:
> Here's a version of this patch rebased over commit 625b38ea0.
> 
> That commit's fix for the possibly-expensive memset means that we need
> to reconsider performance numbers for this patch.  I re-ran my previous
> tests, and it's still looking like this is a substantial win, as it makes
> snprintf.c faster than the native snprintf for most non-float cases.
> We're still stuck at something like 10% penalty for float cases.

Cool.  Let's get that in...

> While there might be value in implementing our own float printing code,
> I have a pretty hard time getting excited about the cost/benefit ratio
> of that.  I think that what we probably really ought to do here is hack
> float4out/float8out to bypass the extra overhead, as in the 0002 patch
> below.

I'm thinking we should do a bit more than just that hack. I'm thinking
of something (barely tested) like

int
pg_double_to_string(char *buf, size_t bufsize, char tp, int precision, double 
val)
{
charfmt[8];

#ifdef HAVE_STRFROMD

if (precision != -1)
{
fmt[0] = '%';
fmt[1] = '.';
fmt[2] = '0' + precision / 10;
fmt[3] = '0' + precision % 10;
fmt[4] = tp;
fmt[5] = '\0';
}
else
{
fmt[0] = '%';
fmt[1] = tp;
fmt[2] = '\0';
}

return strfromd(buf, bufsize, fmt, val);
#else

if (precision != -1)
{
fmt[0] = '%';
fmt[1] = '.';
fmt[2] = '*';
fmt[3] = tp;
fmt[4] = '\0';
}
else
{
fmt[0] = '%';
fmt[1] = tp;
fmt[2] = '\0';
}

#undef snprintf
return snprintf(buf, bufsize, fmt, precision, val);
#define sprintf pg_snprintf
#endif
}

and putting that in string.h or such.

Then we'd likely be faster both when going through pg_sprintf etc when
strfromd is available, and by using it directly in float8out etc, we'd
be at least as fast as before.

I can clean that up, just not tonight.

FWIW, I think there's still a significant argument to be made that we
should work on our floating point IO performance. Both on the input and
output side. It's a significant practical problem. But both a fix like
you describe, and my proposal, should bring us to at least the previous
level of performance for the hot paths. So that'd then just be an
independent consideration.

Greetings,

Andres Freund

Re: Performance improvements for src/port/snprintf.c

2018-10-02 Thread Tom Lane

Andres Freund  writes:
> I've looked through the patch. Looks good to me. Some minor notes:

[ didn't see this till after sending my previous ]

> - How about adding our own strchrnul for the case where we don't
>   HAVE_STRCHRNUL?  It's possible that other platforms have something
>   similar, and the code wouldlook more readable that way.

Sure, we could just make a "static inline strchrnul()" for use when
!HAVE_STRCHRNUL.  No objection.

> - I know it's not new, but is it actually correct to use va_arg(args, int64)
>   for ATYPE_LONGLONG?

Well, the problem with just doing s/int64/long long/g is that the
code would then fail on compilers without a "long long" type.
We could ifdef our way around that, but I don't think the code would
end up prettier.

Given that we only ever use "ll" modifiers via INT64_FORMAT, and that
that'll only be set to "ll" if int64 is indeed "long long", those code
paths should be dead code in any situation where the type pun is wrong.

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-10-02 Thread Tom Lane

Here's a version of this patch rebased over commit 625b38ea0.

That commit's fix for the possibly-expensive memset means that we need
to reconsider performance numbers for this patch.  I re-ran my previous
tests, and it's still looking like this is a substantial win, as it makes
snprintf.c faster than the native snprintf for most non-float cases.
We're still stuck at something like 10% penalty for float cases.

While there might be value in implementing our own float printing code,
I have a pretty hard time getting excited about the cost/benefit ratio
of that.  I think that what we probably really ought to do here is hack
float4out/float8out to bypass the extra overhead, as in the 0002 patch
below.

For reference, I attach the testbed I'm using now plus some results.
I wasn't able to get my cranky NetBSD system up today, so I don't
have results for that.  However, I did add recent glibc (Fedora 28)
to the mix, and I was interested to discover that they seem to have
added a fast-path for format strings that are exactly "%s", just as
NetBSD did.  I wonder if we should reconsider our position on doing
that.  It'd be a simple enough addition...

regards, tom lane

diff --git a/configure b/configure
index 6414ec1..0448c6b 100755
*** a/configure
--- b/configure
*** fi
*** 15100,15106 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! for ac_func in cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l
  do :
as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
--- 15100,15106 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! for ac_func in cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink sync_file_range utime utimes wcstombs_l
  do :
as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index 158d5a1..23b5bb8 100644
*** a/configure.in
--- b/configure.in
*** PGAC_FUNC_WCSTOMBS_L
*** 1571,1577 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! AC_CHECK_FUNCS([cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l])
  
  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
--- 1571,1577 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! AC_CHECK_FUNCS([cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink sync_file_range utime utimes wcstombs_l])
  
  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 90dda8e..7894caa 100644
*** a/src/include/pg_config.h.in
--- b/src/include/pg_config.h.in
***
*** 523,528 
--- 523,531 
  /* Define to 1 if you have the  header file. */
  #undef HAVE_STDLIB_H
  
+ /* Define to 1 if you have the `strchrnul' function. */
+ #undef HAVE_STRCHRNUL
+ 
  /* Define to 1 if you have the `strerror_r' function. */
  #undef HAVE_STRERROR_R
  
diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index 93bb773..f7a051d 100644
*** a/src/include/pg_config.h.win32
--- b/src/include/pg_config.h.win32
***
*** 394,399 
--- 394,402 
  /* Define to 1 if you have the  header file. */
  #define HAVE_STDLIB_H 1
  
+ /* Define to 1 if you have the `strchrnul' function. */
+ /* #undef HAVE_STRCHRNUL */
+ 
  /* Define to 1 if you have the `strerror_r' function. */
  /* #undef HAVE_STRERROR_R */
  
diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 1be5f70..3094ad8 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*** flushbuffer(PrintfTarget *target)
*** 314,320 
  }
  
  
! static void fmtstr(char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
--- 314,322 
  }
  
  
! static bool find_arguments(const char *format, va_list args,
! 			   PrintfArgValue *argvalues);
! static void fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag,

Re: Performance improvements for src/port/snprintf.c

2018-10-02 Thread Andres Freund

Hi,

On 2018-09-26 21:30:25 -0400, Tom Lane wrote:
> Here's a rebased version of <15785.1536776...@sss.pgh.pa.us>.
> 
> I think we should try to get this reviewed and committed before
> we worry more about the float business.  It would be silly to
> not be benchmarking any bigger changes against the low-hanging
> fruit here.

I've looked through the patch. Looks good to me. Some minor notes:

- How about adding our own strchrnul for the case where we don't
  HAVE_STRCHRNUL?  It's possible that other platforms have something
  similar, and the code wouldlook more readable that way.
- I know it's not new, but is it actually correct to use va_arg(args, int64)
  for ATYPE_LONGLONG?

Greetings,

Andres Freund

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund




On September 26, 2018 9:04:08 PM PDT, Thomas Munro 
 wrote:
>On Thu, Sep 27, 2018 at 3:55 PM Andrew Gierth
> wrote:
>> > "Andres" == Andres Freund  writes:
>>  Andres> Hm, stb's results just for floating point isn't bad. The
>above
>>  Andres> numbers were for %f %f. But as the minimal usage would be
>about
>>  Andres> the internal usage of dopr(), here's comparing %.*f:
>>
>>  Andres> snprintf time = 1324.87 ms total, 0.000264975 ms per
>iteration
>>  Andres> pg time = 1434.57 ms total, 0.000286915 ms per iteration
>>  Andres> stbsp time = 552.14 ms total, 0.000110428 ms per iteration
>>
>> Hmm. We had a case recently on IRC where the performance of float8out
>> turned out to be the major bottleneck: a table of about 2.7 million
>rows
>> and ~70 float columns showed an overhead of ~66 seconds for doing
>COPY
>> as opposed to COPY BINARY (the actual problem report was that doing
>> "select * from table" from R was taking a minute+ longer than
>expected,
>> we got comparative timings for COPY just to narrow down causes).
>>
>> That translates to approx. 0.00035 ms overhead (i.e. time(float8out)
>-
>> time(float8send)) per conversion (Linux server, hardware unknown).
>>
>> That 66 seconds was the difference between 18s and 1m24s, so it
>wasn't a
>> small factor but totally dominated the query time.
>
>For perfect and cheap round trip to ASCII, not for human consumption,
>I wonder about the hexadecimal binary float literal format from C99
>(and showing up in other places too).

I'm not quite sure how we realistically would migrate to that though. Clients 
and their users won't understand it, and the more knowledgeable ones will 
already use the binary protocol.

Answers
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund




On September 26, 2018 8:53:27 PM PDT, Andrew Gierth 
 wrote:
>> "Andres" == Andres Freund  writes:
>
> Andres> Hm, stb's results just for floating point isn't bad. The above
>Andres> numbers were for %f %f. But as the minimal usage would be about
> Andres> the internal usage of dopr(), here's comparing %.*f:
>
> Andres> snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
> Andres> pg time = 1434.57 ms total, 0.000286915 ms per iteration
> Andres> stbsp time = 552.14 ms total, 0.000110428 ms per iteration
>
>Hmm. We had a case recently on IRC where the performance of float8out
>turned out to be the major bottleneck: a table of about 2.7 million
>rows
>and ~70 float columns showed an overhead of ~66 seconds for doing COPY
>as opposed to COPY BINARY (the actual problem report was that doing
>"select * from table" from R was taking a minute+ longer than expected,
>we got comparative timings for COPY just to narrow down causes).
>
>That translates to approx. 0.00035 ms overhead (i.e. time(float8out) -
>time(float8send)) per conversion (Linux server, hardware unknown).

Sounds like it could be pretty precisely be the cost measured above. My 
laptop's a bit faster than most server CPUs and the test has perfect branch 
prediction...


>That 66 seconds was the difference between 18s and 1m24s, so it wasn't
>a
>small factor but totally dominated the query time.


Ugh.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Thomas Munro

On Thu, Sep 27, 2018 at 3:55 PM Andrew Gierth
 wrote:
> > "Andres" == Andres Freund  writes:
>  Andres> Hm, stb's results just for floating point isn't bad. The above
>  Andres> numbers were for %f %f. But as the minimal usage would be about
>  Andres> the internal usage of dopr(), here's comparing %.*f:
>
>  Andres> snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
>  Andres> pg time = 1434.57 ms total, 0.000286915 ms per iteration
>  Andres> stbsp time = 552.14 ms total, 0.000110428 ms per iteration
>
> Hmm. We had a case recently on IRC where the performance of float8out
> turned out to be the major bottleneck: a table of about 2.7 million rows
> and ~70 float columns showed an overhead of ~66 seconds for doing COPY
> as opposed to COPY BINARY (the actual problem report was that doing
> "select * from table" from R was taking a minute+ longer than expected,
> we got comparative timings for COPY just to narrow down causes).
>
> That translates to approx. 0.00035 ms overhead (i.e. time(float8out) -
> time(float8send)) per conversion (Linux server, hardware unknown).
>
> That 66 seconds was the difference between 18s and 1m24s, so it wasn't a
> small factor but totally dominated the query time.

For perfect and cheap round trip to ASCII, not for human consumption,
I wonder about the hexadecimal binary float literal format from C99
(and showing up in other places too).

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andrew Gierth

> "Andres" == Andres Freund  writes:

 Andres> Hm, stb's results just for floating point isn't bad. The above
 Andres> numbers were for %f %f. But as the minimal usage would be about
 Andres> the internal usage of dopr(), here's comparing %.*f:

 Andres> snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
 Andres> pg time = 1434.57 ms total, 0.000286915 ms per iteration
 Andres> stbsp time = 552.14 ms total, 0.000110428 ms per iteration

Hmm. We had a case recently on IRC where the performance of float8out
turned out to be the major bottleneck: a table of about 2.7 million rows
and ~70 float columns showed an overhead of ~66 seconds for doing COPY
as opposed to COPY BINARY (the actual problem report was that doing
"select * from table" from R was taking a minute+ longer than expected,
we got comparative timings for COPY just to narrow down causes).

That translates to approx. 0.00035 ms overhead (i.e. time(float8out) -
time(float8send)) per conversion (Linux server, hardware unknown).

That 66 seconds was the difference between 18s and 1m24s, so it wasn't a
small factor but totally dominated the query time.

-- 
Andrew (irc:RhodiumToad)

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Tom Lane

Andres Freund  writes:
> On 2018-09-26 21:44:41 -0400, Tom Lane wrote:
>> BTW, were you thinking of plugging in strfromd() inside snprintf.c,
>> or just invoking it directly from float[48]out?  The latter would
>> presumably be cheaper, and it'd solve the most pressing performance
>> problem, if not every problem.

> I wasn't actually seriously suggesting we should use strfromd, but I
> guess one way to deal with this would be to add a wrapper routine that
> could directly be called from float[48]out *and* from fmtfloat().

Yeah, something along that line occurred to me a bit later.

> Wonder
> if it'd be worthwhile to *not* pass that wrapper a format string, but
> instead pass the sprecision as an explicit argument.

Right, getting rid of the round trip to text for the precision seems
like a win.  I'm surprised that strfromd is defined the way it is and
not with something like (double val, char fmtcode, int precision, ...)

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund

On 2018-09-26 21:44:41 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Reading around the interwebz lead me to look at ryu
> 
> > https://dl.acm.org/citation.cfm?id=3192369
> > https://github.com/ulfjack/ryu/tree/46f4c5572121a6f1428749fe3e24132c3626c946
> 
> > That's an algorithm that always generates the minimally sized
> > roundtrip-safe string output for a floating point number. That makes it
> > insuitable for the innards of printf, but it very well could be
> > interesting for e.g. float8out, especially when we currently specify a
> > "too high" precision to guarantee round-trip safeity.
> 
> Yeah, the whole business of round-trip safety is a bit worrisome.

Seems like using a better algorithm also has the potential to make the
output a bit smaller / more readable than what we currently produce.

> If we change printf, and it produces different low-order digits
> than before, will floats still round-trip correctly?  I think we
> have to ensure that they do.

Yea, I think that's an absolutely hard requirement.  It'd possibly be a
good idea to add an  assert that enforce that, although I'm not sure
it's worth the portability issues around crappy system libcs that do
randomly different things.

> BTW, were you thinking of plugging in strfromd() inside snprintf.c,
> or just invoking it directly from float[48]out?  The latter would
> presumably be cheaper, and it'd solve the most pressing performance
> problem, if not every problem.

I wasn't actually seriously suggesting we should use strfromd, but I
guess one way to deal with this would be to add a wrapper routine that
could directly be called from float[48]out *and* from fmtfloat(). Wonder
if it'd be worthwhile to *not* pass that wrapper a format string, but
instead pass the sprecision as an explicit argument.  Would make the use
in snprintf.c a bit more annoying (due to fFeEgG support), but probably
considerably simpler and faster if we ever reimplement that ourself.

Greetings,

Andres Freund

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Thomas Munro

On Thu, Sep 27, 2018 at 1:18 PM Andres Freund  wrote:
> On 2018-09-26 17:57:05 -0700, Andres Freund wrote:
> > snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
> > pg time = 1434.57 ms total, 0.000286915 ms per iteration
> > stbsp time = 552.14 ms total, 0.000110428 ms per iteration
>
> Reading around the interwebz lead me to look at ryu
>
> https://dl.acm.org/citation.cfm?id=3192369
> https://github.com/ulfjack/ryu/tree/46f4c5572121a6f1428749fe3e24132c3626c946
>
> That's an algorithm that always generates the minimally sized
> roundtrip-safe string output for a floating point number. That makes it
> insuitable for the innards of printf, but it very well could be
> interesting for e.g. float8out, especially when we currently specify a
> "too high" precision to guarantee round-trip safeity.

Wow.  While all the algorithms have that round trip goal, they keep
doing it faster.  I was once interested in their speed for a work
problem, and looked into the 30 year old dragon4 and 8 year old grisu3
algorithms.  It's amazing to me that we have a new algorithm in 2018
for this ancient problem, and it claims to be 3 times faster than the
competition.  (Hah, I see that "ryū" is Japanese for dragon.  "Grisù"
is a dragon from an Italian TV series.)

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund

Hi,

On 2018-09-26 21:30:25 -0400, Tom Lane wrote:
> Here's a rebased version of <15785.1536776...@sss.pgh.pa.us>.
> 
> I think we should try to get this reviewed and committed before
> we worry more about the float business.  It would be silly to
> not be benchmarking any bigger changes against the low-hanging
> fruit here.

Yea, no arguments there.

I'll try to have a look tomorrow.

Greetings,

Andres Freund

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Tom Lane

Andres Freund  writes:
> Reading around the interwebz lead me to look at ryu

> https://dl.acm.org/citation.cfm?id=3192369
> https://github.com/ulfjack/ryu/tree/46f4c5572121a6f1428749fe3e24132c3626c946

> That's an algorithm that always generates the minimally sized
> roundtrip-safe string output for a floating point number. That makes it
> insuitable for the innards of printf, but it very well could be
> interesting for e.g. float8out, especially when we currently specify a
> "too high" precision to guarantee round-trip safeity.

Yeah, the whole business of round-trip safety is a bit worrisome.
If we change printf, and it produces different low-order digits
than before, will floats still round-trip correctly?  I think we
have to ensure that they do.  If we just use strfromd(), then it's
libc's problem if the results change ... but if we stick in some
code we got from elsewhere, it's our problem.

BTW, were you thinking of plugging in strfromd() inside snprintf.c,
or just invoking it directly from float[48]out?  The latter would
presumably be cheaper, and it'd solve the most pressing performance
problem, if not every problem.

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Tom Lane

Here's a rebased version of <15785.1536776...@sss.pgh.pa.us>.

I think we should try to get this reviewed and committed before
we worry more about the float business.  It would be silly to
not be benchmarking any bigger changes against the low-hanging
fruit here.

regards, tom lane

diff --git a/configure b/configure
index 6414ec1..0448c6b 100755
*** a/configure
--- b/configure
*** fi
*** 15100,15106 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! for ac_func in cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l
  do :
as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
--- 15100,15106 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! for ac_func in cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink sync_file_range utime utimes wcstombs_l
  do :
as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index 158d5a1..23b5bb8 100644
*** a/configure.in
--- b/configure.in
*** PGAC_FUNC_WCSTOMBS_L
*** 1571,1577 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! AC_CHECK_FUNCS([cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l])
  
  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
--- 1571,1577 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! AC_CHECK_FUNCS([cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink sync_file_range utime utimes wcstombs_l])
  
  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 90dda8e..7894caa 100644
*** a/src/include/pg_config.h.in
--- b/src/include/pg_config.h.in
***
*** 523,528 
--- 523,531 
  /* Define to 1 if you have the  header file. */
  #undef HAVE_STDLIB_H
  
+ /* Define to 1 if you have the `strchrnul' function. */
+ #undef HAVE_STRCHRNUL
+ 
  /* Define to 1 if you have the `strerror_r' function. */
  #undef HAVE_STRERROR_R
  
diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index 93bb773..f7a051d 100644
*** a/src/include/pg_config.h.win32
--- b/src/include/pg_config.h.win32
***
*** 394,399 
--- 394,402 
  /* Define to 1 if you have the  header file. */
  #define HAVE_STDLIB_H 1
  
+ /* Define to 1 if you have the `strchrnul' function. */
+ /* #undef HAVE_STRCHRNUL */
+ 
  /* Define to 1 if you have the `strerror_r' function. */
  /* #undef HAVE_STRERROR_R */
  
diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 2c77eec..1469878 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*** flushbuffer(PrintfTarget *target)
*** 310,316 
  }
  
  
! static void fmtstr(char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
--- 310,318 
  }
  
  
! static bool find_arguments(const char *format, va_list args,
! 			   PrintfArgValue *argvalues);
! static void fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
*** static void fmtfloat(double value, char 
*** 322,332 
  		 PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
  static int	adjust_sign(int is_negative, int forcesign, int *signvalue);
! static void adjust_padlen(int minlen, int vallen, int leftjust, int *padlen);
! static void leading_pad(int zpad, int *signvalue, int *padlen,
  			PrintfTarget *target);
! static void trailing_pad(int *padlen, PrintfTarget *target);
  
  
  /*
--- 324,335 
  		 PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
+ static void dopr_outchmulti(int c, int slen,

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund

Hi,

On 2018-09-26 17:57:05 -0700, Andres Freund wrote:
> snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
> pg time = 1434.57 ms total, 0.000286915 ms per iteration
> stbsp time = 552.14 ms total, 0.000110428 ms per iteration

Reading around the interwebz lead me to look at ryu

https://dl.acm.org/citation.cfm?id=3192369
https://github.com/ulfjack/ryu/tree/46f4c5572121a6f1428749fe3e24132c3626c946

That's an algorithm that always generates the minimally sized
roundtrip-safe string output for a floating point number. That makes it
insuitable for the innards of printf, but it very well could be
interesting for e.g. float8out, especially when we currently specify a
"too high" precision to guarantee round-trip safeity.

Greetings,

Andres Freund

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund

On 2018-09-26 17:40:22 -0700, Andres Freund wrote:
> On 2018-09-26 20:25:44 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > On 2018-09-26 19:45:07 -0400, Tom Lane wrote:
> > >> No, there are no additional layers that weren't there before ---
> > >> snprintf.c's snprintf() slots in directly where the platform's did 
> > >> before.
> > 
> > > Hm? What I mean is that we can't realistically be faster with the
> > > current architecture, because for floating point we end up doing glibc
> > > sprintf() in either case.
> > 
> > Oh, you mean specifically for the float conversion case.  I still say
> > that I will *not* accept judging this code solely on the float case.
> 
> Oh, it should definitely not be judged solely based on floating point,
> we agree.
> 
> 
> > The string and integer cases are at least as important if not more so.
> 
> I think the integer stuff has become a *little* bit less important,
> because we converted the hot cases over to pg_lto etc.
> 
> 
> > >> Interesting.  strfromd() is a glibc-ism, and a fairly recent one at
> > >> that (my RHEL6 box doesn't seem to have it).
> > 
> > > It's C99 afaict.
> > 
> > It's not in POSIX 2008, and I don't see it in my admittedly-draft
> > copy of C99 either.  But that's not real relevant -- I don't see
> > much reason not to use it if we want a quick and dirty answer for
> > the platforms that have it.
> 
> Right, I really just wanted some more baseline numbers.
> 
> 
> > If we had more ambition, we might consider stealing the float
> > conversion logic out of the "stb" library that Alexander pointed
> > to upthread.  It says it's public domain, so there's no license
> > impediment to borrowing some code ...
> 
> Yea, I started to play around with doing so with musl, but based on
> early my benchmarks it's not fast enough to bother.  I've not integrated
> it into our code, but instead printed two floating point numbers with
> your test:
> 
> musl 500 iterations:
> snprintf time = 3144.46 ms total, 0.000628892 ms per iteration
> pg_snprintf time = 4215.1 ms total, 0.00084302 ms per iteration
> ratio = 1.340
> 
> glibc 500 iterations:
> snprintf time = 1680.82 ms total, 0.000336165 ms per iteration
> pg_snprintf time = 2629.46 ms total, 0.000525892 ms per iteration
> ratio = 1.564
> 
> So there's pretty clearly no point in even considering starting from
> musl.

Hm, stb's results just for floating point isn't bad. The above numbers
were for %f %f. But as the minimal usage would be about the internal
usage of dopr(), here's comparing %.*f:

snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
pg time = 1434.57 ms total, 0.000286915 ms per iteration
stbsp time = 552.14 ms total, 0.000110428 ms per iteration

Greetings,

Andres Freund

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund

On 2018-09-26 20:25:44 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-09-26 19:45:07 -0400, Tom Lane wrote:
> >> No, there are no additional layers that weren't there before ---
> >> snprintf.c's snprintf() slots in directly where the platform's did before.
> 
> > Hm? What I mean is that we can't realistically be faster with the
> > current architecture, because for floating point we end up doing glibc
> > sprintf() in either case.
> 
> Oh, you mean specifically for the float conversion case.  I still say
> that I will *not* accept judging this code solely on the float case.

Oh, it should definitely not be judged solely based on floating point,
we agree.


> The string and integer cases are at least as important if not more so.

I think the integer stuff has become a *little* bit less important,
because we converted the hot cases over to pg_lto etc.


> >> Interesting.  strfromd() is a glibc-ism, and a fairly recent one at
> >> that (my RHEL6 box doesn't seem to have it).
> 
> > It's C99 afaict.
> 
> It's not in POSIX 2008, and I don't see it in my admittedly-draft
> copy of C99 either.  But that's not real relevant -- I don't see
> much reason not to use it if we want a quick and dirty answer for
> the platforms that have it.

Right, I really just wanted some more baseline numbers.


> If we had more ambition, we might consider stealing the float
> conversion logic out of the "stb" library that Alexander pointed
> to upthread.  It says it's public domain, so there's no license
> impediment to borrowing some code ...

Yea, I started to play around with doing so with musl, but based on
early my benchmarks it's not fast enough to bother.  I've not integrated
it into our code, but instead printed two floating point numbers with
your test:

musl 500 iterations:
snprintf time = 3144.46 ms total, 0.000628892 ms per iteration
pg_snprintf time = 4215.1 ms total, 0.00084302 ms per iteration
ratio = 1.340

glibc 500 iterations:
snprintf time = 1680.82 ms total, 0.000336165 ms per iteration
pg_snprintf time = 2629.46 ms total, 0.000525892 ms per iteration
ratio = 1.564

So there's pretty clearly no point in even considering starting from
musl.

Greetings,

Andres Freund

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Tom Lane

Andres Freund  writes:
> On 2018-09-26 19:45:07 -0400, Tom Lane wrote:
>> No, there are no additional layers that weren't there before ---
>> snprintf.c's snprintf() slots in directly where the platform's did before.

> Hm? What I mean is that we can't realistically be faster with the
> current architecture, because for floating point we end up doing glibc
> sprintf() in either case.

Oh, you mean specifically for the float conversion case.  I still say
that I will *not* accept judging this code solely on the float case.
The string and integer cases are at least as important if not more so.

>> Interesting.  strfromd() is a glibc-ism, and a fairly recent one at
>> that (my RHEL6 box doesn't seem to have it).

> It's C99 afaict.

It's not in POSIX 2008, and I don't see it in my admittedly-draft
copy of C99 either.  But that's not real relevant -- I don't see
much reason not to use it if we want a quick and dirty answer for
the platforms that have it.

If we had more ambition, we might consider stealing the float
conversion logic out of the "stb" library that Alexander pointed
to upthread.  It says it's public domain, so there's no license
impediment to borrowing some code ...

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund

On 2018-09-26 19:45:07 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-09-26 15:04:20 -0700, Andres Freund wrote:
> >> I assume this partially is just the additional layers of function calls
> >> (psprintf, pvsnprintf, pg_vsnprintf, dopr) that are now done, in
> >> addition to pretty much the same work as before (i.e. sprintf("%.*f")).
> 
> No, there are no additional layers that weren't there before ---
> snprintf.c's snprintf() slots in directly where the platform's did before.

Hm? What I mean is that we can't realistically be faster with the
current architecture, because for floating point we end up doing glibc
sprintf() in either case.  And after the unconditional replacement,
we're doing a bunch of *additional* work (at the very least we're
parsing the format string twice).

> Well, ok, dopr() wasn't there before, but I trust you're not claiming
> that glibc's implementation of snprintf() is totally flat either.

I don't even think it's all that fast...


> > I'm *NOT* proposing that as the actual solution, but as a datapoint, it
> > might be interesting that hardcoding the precision and thus allowing use
> > ofusing strfromd() instead of sprintf yields a *better* runtime than
> > master.
> 
> Interesting.  strfromd() is a glibc-ism, and a fairly recent one at
> that (my RHEL6 box doesn't seem to have it).  But we could use it where
> available.  And it doesn't seem unreasonable to have a fast path for
> the specific precision value(s) that float4/8out will actually use.

It's C99 afaict.  What I did for my quick hack is to just hack the
precision as characters into the format that dopr() uses...

Greetings,

Andres Freund

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Tom Lane

Andres Freund  writes:
> I kinda wonder if we shouldn't replace the non pg_* functions in
> snprintf.c with a more modern copy of a compatibly licensed libc. Looks
> to me like our implementation has forked off some BSD a fair while ago.

Maybe, but the benchmarking I was doing last month didn't convince me
that the *BSD versions were remarkably fast.  There are a lot of cases
where our version is faster.

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Tom Lane

Andres Freund  writes:
> On 2018-09-26 15:04:20 -0700, Andres Freund wrote:
>> I assume this partially is just the additional layers of function calls
>> (psprintf, pvsnprintf, pg_vsnprintf, dopr) that are now done, in
>> addition to pretty much the same work as before (i.e. sprintf("%.*f")).

No, there are no additional layers that weren't there before ---
snprintf.c's snprintf() slots in directly where the platform's did before.

Well, ok, dopr() wasn't there before, but I trust you're not claiming
that glibc's implementation of snprintf() is totally flat either.

I think it's just that snprintf.c is a bit slower in this case.  If you
look at glibc's implementation, they've expended a heck of a lot of code
and sweat on it.  The only reason we could hope to beat it is that we're
prepared to throw out some functionality, like LC_NUMERIC handling.

> I'm *NOT* proposing that as the actual solution, but as a datapoint, it
> might be interesting that hardcoding the precision and thus allowing use
> ofusing strfromd() instead of sprintf yields a *better* runtime than
> master.

Interesting.  strfromd() is a glibc-ism, and a fairly recent one at
that (my RHEL6 box doesn't seem to have it).  But we could use it where
available.  And it doesn't seem unreasonable to have a fast path for
the specific precision value(s) that float4/8out will actually use.

regards, tom lane

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund

On 2018-08-17 14:32:59 -0400, Tom Lane wrote:
> I've been looking into the possible performance consequences of that,
> in particular comparing snprintf.c to the library versions on macOS,
> FreeBSD, OpenBSD, and NetBSD.  While it held up well in simpler cases,
> I noted that it was significantly slower on long format strings, which
> I traced to two separate problems:

> Perhaps there's a way to improve that
> without writing our own floating-point conversion code, but I'm not
> seeing an easy way offhand.  I don't think that's a showstopper though.
> This code is now faster than the native code for very many other cases,
> so on average it should cause no real performance problem.

I kinda wonder if we shouldn't replace the non pg_* functions in
snprintf.c with a more modern copy of a compatibly licensed libc. Looks
to me like our implementation has forked off some BSD a fair while ago.

There seems to be a few choices. Among others:
- freebsd libc:
  https://github.com/freebsd/freebsd/blob/master/lib/libc/stdio/vfprintf.c
  (floating point stuff is elsewhere)
- musl libc:
  https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c
- stb (as Alexander referenced earlier)
  https://github.com/nothings/stb/blob/master/stb_sprintf.h

I've not benchmarked any of these. Just by looking at the code, the musl
one looks by far the most compact - looks like all the relevant code is
in the one file referenced.

Greetings,

Andres Freund

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund

On 2018-09-26 15:04:20 -0700, Andres Freund wrote:
> On 2018-09-12 14:14:15 -0400, Tom Lane wrote:
> > Alexander Kuzmenkov  writes:
> > > I benchmarked this, using your testbed and comparing to libc sprintf
> > > (Ubuntu GLIBC 2.27-0ubuntu3) and another implementation I know [1], all
> > > compiled with gcc 5.
> >
> > Thanks for reviewing!
> >
> > The cfbot noticed that the recent dlopen patch conflicted with this in
> > configure.in, so here's a rebased version.  The code itself didn't change.
> 
> Conflicts again, but not too hard to resolve.
> 
> The mini benchmark from 
> http://archives.postgresql.org/message-id/20180926174645.nsyj77lx2mvtz4kx%40alap3.anarazel.de
> is significantly improved by this patch.
> 
> > 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63:
> >
> > COPY somefloats TO '/dev/null';
> > COPY 1000
> > Time: 24575.770 ms (00:24.576)
> >
> > 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^:
> >
> > COPY somefloats TO '/dev/null';
> > COPY 1000
> > Time: 12877.037 ms (00:12.877)
> 
> This patch:
> 
> postgres[32704][1]=# ;SELECT pg_prewarm('somefloats');COPY somefloats TO 
> '/dev/null';
> Time: 0.269 ms
> ┌┐
> │ pg_prewarm │
> ├┤
> │  73530 │
> └┘
> (1 row)
> 
> Time: 34.983 ms
> COPY 1000
> Time: 15511.478 ms (00:15.511)
> 
> 
> The profile from 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^ is:
> +   38.15%  postgres  libc-2.27.so  [.] __GI___printf_fp_l
> +   13.98%  postgres  libc-2.27.so  [.] hack_digit
> +7.54%  postgres  libc-2.27.so  [.] __mpn_mul_1
> +7.32%  postgres  postgres  [.] CopyOneRowTo
> +6.12%  postgres  libc-2.27.so  [.] vfprintf
> +3.14%  postgres  libc-2.27.so  [.] __strlen_avx2
> +1.97%  postgres  postgres  [.] heap_deform_tuple
> +1.77%  postgres  postgres  [.] AllocSetAlloc
> +1.43%  postgres  postgres  [.] psprintf
> +1.25%  postgres  libc-2.27.so  [.] _IO_str_init_static_internal
> +1.09%  postgres  libc-2.27.so  [.] _IO_vsnprintf
> +1.09%  postgres  postgres  [.] appendBinaryStringInfo
> 
> The profile of master with this patch is:
> 
> +   32.38%  postgres  libc-2.27.so  [.] __GI___printf_fp_l
> +   11.08%  postgres  libc-2.27.so  [.] hack_digit
> +9.55%  postgres  postgres  [.] CopyOneRowTo
> +6.24%  postgres  libc-2.27.so  [.] __mpn_mul_1
> +5.01%  postgres  libc-2.27.so  [.] vfprintf
> +4.91%  postgres  postgres  [.] dopr.constprop.4
> +3.53%  postgres  libc-2.27.so  [.] __strlen_avx2
> +1.55%  postgres  libc-2.27.so  [.] __strchrnul_avx2
> +1.49%  postgres  libc-2.27.so  [.] __memmove_avx_unaligned_erms
> +1.35%  postgres  postgres  [.] AllocSetAlloc
> +1.32%  postgres  libc-2.27.so  [.] _IO_str_init_static_internal
> +1.30%  postgres  postgres  [.] FunctionCall1Coll
> +1.27%  postgres  postgres  [.] psprintf
> +1.16%  postgres  postgres  [.] appendBinaryStringInfo
> +1.16%  postgres  libc-2.27.so  [.] _IO_old_init
> +1.06%  postgres  postgres  [.] heap_deform_tuple
> +1.02%  postgres  libc-2.27.so  [.] sprintf
> +1.02%  postgres  libc-2.27.so  [.] _IO_vsprintf
> 
> (all functions above 1%)
> 
> 
> I assume this partially is just the additional layers of function calls
> (psprintf, pvsnprintf, pg_vsnprintf, dopr) that are now done, in
> addition to pretty much the same work as before (i.e. sprintf("%.*f")).

I'm *NOT* proposing that as the actual solution, but as a datapoint, it
might be interesting that hardcoding the precision and thus allowing use
ofusing strfromd() instead of sprintf yields a *better* runtime than
master.

Time: 10255.134 ms (00:10.255)

Greetings,

Andres Freund

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund

On 2018-09-12 14:14:15 -0400, Tom Lane wrote:
> Alexander Kuzmenkov  writes:
> > I benchmarked this, using your testbed and comparing to libc sprintf
> > (Ubuntu GLIBC 2.27-0ubuntu3) and another implementation I know [1], all
> > compiled with gcc 5.
>
> Thanks for reviewing!
>
> The cfbot noticed that the recent dlopen patch conflicted with this in
> configure.in, so here's a rebased version.  The code itself didn't change.

Conflicts again, but not too hard to resolve.

The mini benchmark from 
http://archives.postgresql.org/message-id/20180926174645.nsyj77lx2mvtz4kx%40alap3.anarazel.de
is significantly improved by this patch.

> 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63:
>
> COPY somefloats TO '/dev/null';
> COPY 1000
> Time: 24575.770 ms (00:24.576)
>
> 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^:
>
> COPY somefloats TO '/dev/null';
> COPY 1000
> Time: 12877.037 ms (00:12.877)

This patch:

postgres[32704][1]=# ;SELECT pg_prewarm('somefloats');COPY somefloats TO 
'/dev/null';
Time: 0.269 ms
┌┐
│ pg_prewarm │
├┤
│  73530 │
└┘
(1 row)

Time: 34.983 ms
COPY 1000
Time: 15511.478 ms (00:15.511)


The profile from 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^ is:
+   38.15%  postgres  libc-2.27.so  [.] __GI___printf_fp_l
+   13.98%  postgres  libc-2.27.so  [.] hack_digit
+7.54%  postgres  libc-2.27.so  [.] __mpn_mul_1
+7.32%  postgres  postgres  [.] CopyOneRowTo
+6.12%  postgres  libc-2.27.so  [.] vfprintf
+3.14%  postgres  libc-2.27.so  [.] __strlen_avx2
+1.97%  postgres  postgres  [.] heap_deform_tuple
+1.77%  postgres  postgres  [.] AllocSetAlloc
+1.43%  postgres  postgres  [.] psprintf
+1.25%  postgres  libc-2.27.so  [.] _IO_str_init_static_internal
+1.09%  postgres  libc-2.27.so  [.] _IO_vsnprintf
+1.09%  postgres  postgres  [.] appendBinaryStringInfo

The profile of master with this patch is:

+   32.38%  postgres  libc-2.27.so  [.] __GI___printf_fp_l
+   11.08%  postgres  libc-2.27.so  [.] hack_digit
+9.55%  postgres  postgres  [.] CopyOneRowTo
+6.24%  postgres  libc-2.27.so  [.] __mpn_mul_1
+5.01%  postgres  libc-2.27.so  [.] vfprintf
+4.91%  postgres  postgres  [.] dopr.constprop.4
+3.53%  postgres  libc-2.27.so  [.] __strlen_avx2
+1.55%  postgres  libc-2.27.so  [.] __strchrnul_avx2
+1.49%  postgres  libc-2.27.so  [.] __memmove_avx_unaligned_erms
+1.35%  postgres  postgres  [.] AllocSetAlloc
+1.32%  postgres  libc-2.27.so  [.] _IO_str_init_static_internal
+1.30%  postgres  postgres  [.] FunctionCall1Coll
+1.27%  postgres  postgres  [.] psprintf
+1.16%  postgres  postgres  [.] appendBinaryStringInfo
+1.16%  postgres  libc-2.27.so  [.] _IO_old_init
+1.06%  postgres  postgres  [.] heap_deform_tuple
+1.02%  postgres  libc-2.27.so  [.] sprintf
+1.02%  postgres  libc-2.27.so  [.] _IO_vsprintf

(all functions above 1%)


I assume this partially is just the additional layers of function calls
(psprintf, pvsnprintf, pg_vsnprintf, dopr) that are now done, in
addition to pretty much the same work as before (i.e. sprintf("%.*f")).

- Andres

Re: Performance improvements for src/port/snprintf.c

2018-09-12 Thread Tom Lane

Alexander Kuzmenkov  writes:
> I benchmarked this, using your testbed and comparing to libc sprintf 
> (Ubuntu GLIBC 2.27-0ubuntu3) and another implementation I know [1], all 
> compiled with gcc 5.

Thanks for reviewing!

The cfbot noticed that the recent dlopen patch conflicted with this in
configure.in, so here's a rebased version.  The code itself didn't change.

regards, tom lane

diff --git a/configure b/configure
index dd77742..5fa9396 100755
*** a/configure
--- b/configure
*** fi
*** 15060,15066 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! for ac_func in cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l
  do :
as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
--- 15060,15066 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! for ac_func in cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink sync_file_range utime utimes wcstombs_l
  do :
as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index 3ada48b..93e8556 100644
*** a/configure.in
--- b/configure.in
*** PGAC_FUNC_WCSTOMBS_L
*** 1544,1550 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! AC_CHECK_FUNCS([cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l])

  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
--- 1544,1550 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`

! AC_CHECK_FUNCS([cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink sync_file_range utime utimes wcstombs_l])

  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 4094e22..752a547 100644
*** a/src/include/pg_config.h.in
--- b/src/include/pg_config.h.in
***
*** 531,536 
--- 531,539 
  /* Define to 1 if you have the  header file. */
  #undef HAVE_STDLIB_H

+ /* Define to 1 if you have the `strchrnul' function. */
+ #undef HAVE_STRCHRNUL
+ 
  /* Define to 1 if you have the `strerror' function. */
  #undef HAVE_STRERROR

diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index 6618b43..ea72c44 100644
*** a/src/include/pg_config.h.win32
--- b/src/include/pg_config.h.win32
***
*** 402,407 
--- 402,410 
  /* Define to 1 if you have the  header file. */
  #define HAVE_STDLIB_H 1

+ /* Define to 1 if you have the `strchrnul' function. */
+ /* #undef HAVE_STRCHRNUL */
+ 
  /* Define to 1 if you have the `strerror' function. */
  #ifndef HAVE_STRERROR
  #define HAVE_STRERROR 1
diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 851e2ae..66151c2 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*** flushbuffer(PrintfTarget *target)
*** 295,301 
  }

! static void fmtstr(char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
--- 295,303 
  }

! static bool find_arguments(const char *format, va_list args,
! 			   PrintfArgValue *argvalues);
! static void fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
*** static void fmtfloat(double value, char 
*** 307,317 
  		 PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
  static int	adjust_sign(int is_negative, int forcesign, int *signvalue);
! static void adjust_padlen(int minlen, int vallen, int leftjust, int *padlen);
! static void leading_pad(int zpad, int *signvalue, int *padlen,
  			PrintfTarget *target);
! static void trailing_pad(int *padlen, PrintfTarget *target);

  /*
--- 309,320 
  		 PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static

Re: Performance improvements for src/port/snprintf.c

2018-09-07 Thread Alexander Kuzmenkov

I benchmarked this, using your testbed and comparing to libc sprintf 
(Ubuntu GLIBC 2.27-0ubuntu3) and another implementation I know [1], all 
compiled with gcc 5.4.0 with -O2. I used bigger decimals in one of the 
formats, but otherwise they are the same as yours. Here is the table of 
conversion time relative to libc:


format                                 pg  stb
("%2$.*3$f %1$d\n", 42, 123.456, 2)    1.03    -
("%.*g", 15, 123.456)  1.08    0.31
("%10d", 15)   0.63    0.52
("%s", "012345678900123456789001234    2.06    6.20
("%d 012345678900123456789001234567    2.03    1.81
("%1$d 0123456789001234567890012345    1.34    -
("%d %d", 845879348, 994502893)    1.97    0.59

Surprisingly, our implementation is twice faster than libc on "%10d". 
Stb is faster than we are with floats, but it uses its own algorithm for 
that. It is also faster with decimals, probably because it uses a 
two-digit lookup table, not one-digit like we do. Unfortunately it 
doesn't support dollars.


1. https://github.com/nothings/stb/blob/master/stb_sprintf.h

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Performance improvements for src/port/snprintf.c

2018-08-19 Thread Tom Lane

I wrote:
> [ snprintf-speedups-1.patch ]

Here's a slightly improved version of that, with two changes:

* Given the current state of the what-about-%m thread, it's no longer
academic how well this performs relative to glibc's version.  I poked
at that and found that a lot of the discrepancy came from glibc using
strchrnul() to find the next format specifier --- apparently, that
function is a *lot* faster than the equivalent manual loop.  So this
version uses that if available.

* I thought of a couple of easy wins for fmtfloat.  We can pass the
precision spec down to the platform's sprintf using "*" notation instead
of converting it to text and back, and that also simplifies matters enough
that we can avoid using an sprintf call to build the simplified format
string.  This seems to get us down to the vicinity of a 10% speed penalty
on microbenchmarks of just float conversion, which is enough to satisfy
me given the other advantages of switching to our own snprintf.

regards, tom lane

diff --git a/configure b/configure
index 836d68d..dff9f0c 100755
*** a/configure
--- b/configure
*** fi
*** 15032,15038 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l
  do :
as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
--- 15032,15038 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! for ac_func in cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink sync_file_range utime utimes wcstombs_l
  do :
as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index 6e14106..c00bb8f 100644
*** a/configure.in
--- b/configure.in
*** PGAC_FUNC_WCSTOMBS_L
*** 1535,1541 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! AC_CHECK_FUNCS([cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l])
  
  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
--- 1535,1541 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! AC_CHECK_FUNCS([cbrt clock_gettime dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink sync_file_range utime utimes wcstombs_l])
  
  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 827574e..da9cfa7 100644
*** a/src/include/pg_config.h.in
--- b/src/include/pg_config.h.in
***
*** 519,524 
--- 519,527 
  /* Define to 1 if you have the  header file. */
  #undef HAVE_STDLIB_H
  
+ /* Define to 1 if you have the `strchrnul' function. */
+ #undef HAVE_STRCHRNUL
+ 
  /* Define to 1 if you have the `strerror' function. */
  #undef HAVE_STRERROR
  
diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index 46ce49d..73d7424 100644
*** a/src/include/pg_config.h.win32
--- b/src/include/pg_config.h.win32
***
*** 390,395 
--- 390,398 
  /* Define to 1 if you have the  header file. */
  #define HAVE_STDLIB_H 1
  
+ /* Define to 1 if you have the `strchrnul' function. */
+ /* #undef HAVE_STRCHRNUL */
+ 
  /* Define to 1 if you have the `strerror' function. */
  #ifndef HAVE_STRERROR
  #define HAVE_STRERROR 1
diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 851e2ae..66151c2 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*** flushbuffer(PrintfTarget *target)
*** 295,301 
  }
  
  
! static void fmtstr(char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
--- 295,303 
  }
  
  
! static bool find_arguments(const char *format, va_list args,
! 			   PrintfArgValue *argvalues);
! static void fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
*** static void

Performance improvements for src/port/snprintf.c

2018-08-17 Thread Tom Lane

Over in the what-about-%m thread, we speculated about replacing the
platform's *printf functions if they didn't support %m, which would
basically mean using src/port/snprintf.c on all non-glibc platforms,
rather than only on Windows as happens right now (ignoring some
obsolete platforms with busted snprintf's).

I've been looking into the possible performance consequences of that,
in particular comparing snprintf.c to the library versions on macOS,
FreeBSD, OpenBSD, and NetBSD.  While it held up well in simpler cases,
I noted that it was significantly slower on long format strings, which
I traced to two separate problems:

1. Our implementation always scans the format string twice, so that it
can sort out argument-ordering options (%n$).  Everybody else is bright
enough to do that only for formats that actually use %n$, and it turns
out that it doesn't really cost anything extra to do so: you can just
perform the extra scan when and if you first find a dollar specifier.
(Perhaps there's an arguable downside for this, with invalid format
strings that have non-dollar conversion specs followed by dollar ones:
with this approach we might fetch some arguments before realizing that
the format is broken.  But a wrong format can cause indefinitely bad
results already, so that seems like a pretty thin objection to me,
especially if all other implementations share the same hazard.)

2. Our implementation is shoving simple data characters in the format
out to the result buffer one at a time.  More common is to skip to the
next % as fast as possible, and then dump anything skipped over using
the string-output code path, reducing the overhead of buffer overrun
checking.

The attached patch fixes both of those things, and also does some
micro-optimization hacking to avoid loops around dopr_outch() as well
as unnecessary use of pass-by-ref arguments.  This version stacks up
pretty well against all the libraries I compared it to.  The remaining
weak spot is that floating-point conversions are consistently 30%-50%
slower than the native libraries, which is not terribly surprising
considering that our implementation involves calling the native sprintf
and then massaging the result.  Perhaps there's a way to improve that
without writing our own floating-point conversion code, but I'm not
seeing an easy way offhand.  I don't think that's a showstopper though.
This code is now faster than the native code for very many other cases,
so on average it should cause no real performance problem.

I've attached both the patch and a simple performance testbed in case
anybody wants to do their own measurements.  For reference's sake,
these are the specific test cases I looked at:

snprintf(buffer, sizeof(buffer),
 "%2$.*3$f %1$d\n",
 42, 123.456, 2);

snprintf(buffer, sizeof(buffer),
 "%.*g", 15, 123.456);

snprintf(buffer, sizeof(buffer),
 "%d %d", 15, 16);

snprintf(buffer, sizeof(buffer),
 "%10d", 15);

snprintf(buffer, sizeof(buffer),
 "%s",
 
"0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890");

snprintf(buffer, sizeof(buffer),
 "%d 
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890",

snprintf(buffer, sizeof(buffer),
 "%1$d 
0123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890012345678900123456789001234567890",
 42);

A couple of other notes of interest:

* The skip-to-next-% searches could alternatively be implemented with
strchr(), although then you need a strlen() call if there isn't another %.
glibc's version of strchr() is fast enough to make that a win, but since
we're not contemplating using this atop glibc, that's not a case we care
about.  On other platforms the manual loop mostly seems to be faster.

* NetBSD seems to have a special fast path for the case that the format
string is exactly "%s".  I did not adopt that idea here, reasoning that
checking for it would add overhead to all other cases, making it probably
a net loss overall.  I'm prepared to listen to arguments otherwise,
though.  It is a common case, I just doubt it's common enough (and
other library authors seem to agree).

I'll add this to the upcoming CF.

regards, tom lane

diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 851e2ae..211ff1b 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*** flushbuffer(PrintfTarget

64 matches

Mail list logo