Re: [HACKERS] pg_test_timing tool for EXPLAIN ANALYZE overhead

2012-03-27 Thread Robert Haas
On Wed, Feb 22, 2012 at 6:53 AM, Greg Smith g...@2ndquadrant.com wrote:
 A look back on this now that I'm done with it does raise one large question
 though.  I added some examples of how to measure timing overhead using psql.
  While I like the broken down timing data that this utility provides, I'm
 not sure whether it's worth adding a contrib module just to get it now
 though.  Extension that's packaged on something like PGXN and easy to
 obtain?  Absolutely--but maybe that's a developer only level thing.  Maybe
 the only code worth distributing is the little SQL example of how to measure
 the overhead, along with some reference good/bad numbers.  That plus the
 intro to timer trivia could turn this into a documentation section only, no
 code change.  I've dreamed of running something like this on every system in
 the build farm.  Even if that's a valuable exercise, even then it may only
 be worth doing once, then reverting.

I had similar concerns, but decided to go ahead and commit this.  It
doesn't relate particularly closely to the buffer I/O timings patch,
but I think it's worth having.  We clearly need something that is
integrated with the PostgreSQL sources, because there is more than one
way to access timers, and this code will measure the overhead of doing
what PostgreSQL does, rather than the overhead of reading times in
some other way.  Now, that doesn't preclude shipping this on PGXN, but
we've certainly got other things in contrib that are clearly a lot
less useful than this, and we've discussed before the folly of
shipping a database product without a full set of diagnostic tools.
Since this tool was developed to answer questions about whether
certain PostgreSQL options are safe to enable or whether they'll have
too much overhead, I think that's a sufficient reason to include it in
contrib, especially because you took the time to write some very good
documentation for it.  I wonder whether we should perhaps move some of
this discussion of clock sources into the main documentation
somewhere, since it's not specifically related to this utility, but I
didn't try to do that for the moment and just left it as you had it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_test_timing tool for EXPLAIN ANALYZE overhead

2012-02-22 Thread Greg Smith
Attached is a feature extracted from the Ants Aasma add timing of 
buffer I/O requests submission.  That included a tool to measure timing 
overhead, from gettimeofday or whatever else INSTR_TIME_SET_CURRENT 
happens to call.  That's what I've broken out here; it's a broader topic 
than just buffer timing.


I fixed some trivial bugs and cleaned up the output of the program, then 
wrote a full documentation section for it.  After that review, I think 
this could be ready to commit...with a big picture caveat below.  Code 
wise, I am mainly concerned about its portability, and that's not often 
a place we get good review help on.  The tool is based on pg_test_fsync. 
 Perhaps Bruce might want to take a look at this, to see if any of the 
changes he's recently made to pg_test_fsync impact what this utility 
should do?  He might be able to flesh out the FreeBSD examples too.


As for why this whole topic is important, I've found the results of this 
new pg_test_timing track quite well with systems where EXPLAIN ANALYZE 
timing overhead is large.  As such, it fills in a gap in the existing 
docs, where that possibility is raised but no way was given to measure 
it--nor determine how to improve it.  I expect we'll be worried about 
how large timing overhead is more for future features, with the first 
example being the rest of Ants's own submission.


A look back on this now that I'm done with it does raise one large 
question though.  I added some examples of how to measure timing 
overhead using psql.  While I like the broken down timing data that this 
utility provides, I'm not sure whether it's worth adding a contrib 
module just to get it now though.  Extension that's packaged on 
something like PGXN and easy to obtain?  Absolutely--but maybe that's a 
developer only level thing.  Maybe the only code worth distributing is 
the little SQL example of how to measure the overhead, along with some 
reference good/bad numbers.  That plus the intro to timer trivia could 
turn this into a documentation section only, no code change.  I've 
dreamed of running something like this on every system in the build 
farm.  Even if that's a valuable exercise, even then it may only be 
worth doing once, then reverting.


Anyway, the patch does now includes several examples and a short primer 
on PC clock hardware, to help guide what good results look like and why 
they've been impossible to obtain in the past.  That's a bit 
Linux-centric, but the hardware described covers almost all systems 
using Intel or AMD processors.  Only difference with most other 
operating systems is how aggressively they have adopted newer timer 
hardware.  At least this gives a way to measure all of them.


Some references used to put together the clock source tutorial:

Microsoft's intro to HPET: 
http://msdn.microsoft.com/en-us/windows/hardware/gg463347
Notes on effective clock resolution: 
http://elinux.org/System_Tap_Timestamp_Notes
VMware clock history and impact on VMs: 
http://www.vmware.com/files/pdf/Timekeeping-In-VirtualMachines.pdf
VMware timer suggestions for various Linux versions: 
http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1006427


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
diff --git a/contrib/Makefile b/contrib/Makefile
index ac0a80a..d230451 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -35,6 +35,7 @@ SUBDIRS = \
 		pg_standby	\
 		pg_stat_statements \
 		pg_test_fsync	\
+		pg_test_timing	\
 		pg_trgm		\
 		pg_upgrade	\
 		pg_upgrade_support \
diff --git a/contrib/pg_test_timing/Makefile b/contrib/pg_test_timing/Makefile
new file mode 100644
index 000..b8b266a
--- /dev/null
+++ b/contrib/pg_test_timing/Makefile
@@ -0,0 +1,18 @@
+# contrib/pg_test_timing/Makefile
+
+PGFILEDESC = pg_test_timing - test timing overhead
+PGAPPICON = win32
+
+PROGRAM  = pg_test_timing
+OBJS = pg_test_timing.o
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_test_timing
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_test_timing/pg_test_timing.c b/contrib/pg_test_timing/pg_test_timing.c
new file mode 100644
index 000..4ff0455
--- /dev/null
+++ b/contrib/pg_test_timing/pg_test_timing.c
@@ -0,0 +1,162 @@
+/*
+ *	pg_test_timing.c
+ *		tests overhead of timing calls and their monoticity:  that
+ * 		they always move forward
+ */
+
+#include postgres_fe.h
+
+#include getopt_long.h
+#include portability/instr_time.h
+
+static const char *progname;
+
+static int32 test_duration = 3;
+
+static void handle_args(int argc, char *argv[]);
+static void test_timing(int32);
+
+int
+main(int argc, char *argv[])
+{
+	progname = get_progname(argv[0]);
+
+	handle_args(argc, argv);
+
+	test_timing(test_duration);
+
+	return 0;
+}
+
+static 

Re: [HACKERS] pg_test_timing tool for EXPLAIN ANALYZE overhead

2012-02-22 Thread Jay Levitt

Greg Smith wrote:


Anyway, the patch does now includes several examples and a short primer on
PC clock hardware, to help guide what good results look like and why they've
been impossible to obtain in the past.  That's a bit Linux-centric, but the
hardware described covers almost all systems using Intel or AMD processors.
Only difference with most other operating systems is how aggressively they
have adopted newer timer hardware.  At least this gives a way to measure all
of them.


N.B.: Windows has at least two clock APIs, timeGetTime and 
QueryPerformanceCounters (and probably more, these days). They rely on 
different hardware clocks, and can get out of sync with each other; 
meanwhile, QueryPerformanceCounters can get out of sync with itself on 
(older?) multi-CPU boards.


So if you're doing high-res timing, it's good to make sure you aren't 
relying on two different clocks in different places... I ran into this with 
MIDI drivers years ago; and wrote a doc:


http://www.ultimatemetal.com/forum/equipment/315910-midi-latency-problem-nuendo.html#post6315034

and a clock-testing utility:

https://github.com/jaylevitt/miditime


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_test_timing tool for EXPLAIN ANALYZE overhead

2012-02-22 Thread Greg Smith

On 02/22/2012 11:10 AM, Jay Levitt wrote:

N.B.: Windows has at least two clock APIs, timeGetTime and
QueryPerformanceCounters (and probably more, these days). They rely on
different hardware clocks, and can get out of sync with each other;
meanwhile, QueryPerformanceCounters can get out of sync with itself on
(older?) multi-CPU boards.


The PostgreSQL wrapper in src/include/portability/instr_time.h uses 
QueryPerformanceCounter and QueryPerformanceFrequency in a way that the 
result can be used similarly to how deltas in UNIX dates are returned.


As far as I've been able to tell, there aren't any issues unique to 
Windows there.  Multiple cores can have their TSC results get out of 
sync on Windows for the same reason they do on Linux systems, and 
there's also the same frequency/temperature issues.


Newer versions of Windows can use TSC, older versions only use HPET or 
ACPI on older versions, and there's some ability to force bad TSC units 
to use ACPI instead:


http://blogs.msdn.com/b/psssql/archive/2010/08/18/how-it-works-timer-outputs-in-sql-server-2008-r2-invariant-tsc.aspx
http://blogs.msdn.com/b/psssql/archive/2010/08/18/how-it-works-timer-outputs-in-sql-server-2008-r2-invariant-tsc.aspx

There is a lot of questionable behavior if you try to use the better 
timers in Windows XP; check out the obnoxious foot note about XP SP3 at 
http://en.wikipedia.org/wiki/High_Precision_Event_Timer for example.


Since there's little you can do about--it's not even possible to figure 
out which timer is being used easily--I didn't see any easy ways to 
document what Windows does here, in a way that helped anyone.  For the 
most part, you get the best timer Windows has a driver for with 
QueryPerformanceCounter, and that's what the PostgreSQL code uses.  The 
information I gave about how newer systems can have more accurate timing 
is still accurate.  Maybe it would be useful to add something pointing 
out that newer Windows versions tend to support faster timers, too. 
That's something I assume people would guess from what I already wrote, 
it may be worth stating explicitly though.



--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_test_timing tool for EXPLAIN ANALYZE overhead

2012-02-22 Thread Marti Raudsepp
On Wed, Feb 22, 2012 at 18:44, Greg Smith g...@2ndquadrant.com wrote:
 As far as I've been able to tell, there aren't any issues unique to Windows
 there.  Multiple cores can have their TSC results get out of sync on Windows
 for the same reason they do on Linux systems, and there's also the same
 frequency/temperature issues.

Not on recent Linux kernel versions. Linux automatically detects when
the TSC is unstable (due to power management or out-of-sync
cores/sockets) and automatically falls back to the more expensive HPET
or ACPI methods.

e.g:
% dmesg |grep -i tsc
[0.00] Fast TSC calibration using PIT
[0.164075] checking TSC synchronization [CPU#0 - CPU#1]: passed.
[0.197062] Switching to clocksource tsc
[0.260960] Marking TSC unstable due to TSC halts in idle

Whether these tests cover 100% of the possible conditions, and whether
the detection has race conditions or not, I don't know.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_test_timing tool for EXPLAIN ANALYZE overhead

2012-02-22 Thread Greg Smith

On 02/22/2012 12:25 PM, Marti Raudsepp wrote:

On Wed, Feb 22, 2012 at 18:44, Greg Smithg...@2ndquadrant.com  wrote:

As far as I've been able to tell, there aren't any issues unique to Windows
there.  Multiple cores can have their TSC results get out of sync on Windows
for the same reason they do on Linux systems, and there's also the same
frequency/temperature issues.


Not on recent Linux kernel versions. Linux automatically detects when
the TSC is unstable (due to power management or out-of-sync
cores/sockets) and automatically falls back to the more expensive HPET
or ACPI methods.


From the patch:

Newer operating systems may check for the known TSC problems and
switch to a slower, more stable clock source when they are seen.
If your system supports TSC time but doesn't default to that, it
may be disabled for a good reason.

I ran into a case like you're showing here in my longer exploration of 
this at 
http://archives.postgresql.org/message-id/4edf1871.2010...@2ndquadrant.com 
 I stopped just short of showing what the TSC error message looked 
like.  I hoped that with the above and some examples showing dmesg | 
grep, that would be enough to lead enough people toward finding this on 
their own.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_test_timing tool for EXPLAIN ANALYZE overhead

2012-02-22 Thread Marti Raudsepp
On Wed, Feb 22, 2012 at 19:36, Greg Smith g...@2ndquadrant.com wrote:
 From the patch:

 Newer operating systems may check for the known TSC problems and
 switch to a slower, more stable clock source when they are seen.
 If your system supports TSC time but doesn't default to that, it
 may be disabled for a good reason.

Sorry, I was under the impression that the stability of Windows's
QueryPerformanceCounter() API is hardware-dependent, but I haven't
coded under Windows for a long time -- maybe it's improved in recent
versions.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers