Re: [HACKERS] Ragged latency log data in multi-threaded pgbench

2010-03-23 Thread Magnus Hagander
On Tue, Mar 23, 2010 at 05:10, Takahiro Itagaki
itagaki.takah...@oss.ntt.co.jp wrote:

 Greg Smith g...@2ndquadrant.com wrote:

 By the way: the pgbench.sgml that you committed looks like it passed
 through a system that added a CR to every line in it.  Probably not the
 way you intended to commit that.

 Oops, fixed. Thanks.

My guess is that this happened because you committed from Windows?

I learned the hard way that this is a bad idea. (Luckily I learned it
on other CVS projects, before I started committing to PostgreSQL).
There are settings to make it not do that, but they are not reliable.
I'd strongly suggest that you always just do a cvs diff on windows and
then use a staging machine running linux or bsd or something to apply
it through. And then you *always* run those patches through something
like fromdos. It's a bunch of extra steps, but it's really the only
way to do it reliably.

If that's not at all what happened, then well, it's still good advice
I think, even if it doesn't apply :-)


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ragged latency log data in multi-threaded pgbench

2010-03-22 Thread Takahiro Itagaki

Greg Smith g...@2ndquadrant.com wrote:

 Attached is an updated version that I think is ready to commit.  Only 
 changes are docs--I rewrote those to improve the wording some.

Thanks for the correction. Applied.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ragged latency log data in multi-threaded pgbench

2010-03-22 Thread Greg Smith

Takahiro Itagaki wrote:

Greg Smith g...@2ndquadrant.com wrote:

  
Attached is an updated version that I think is ready to commit.  Only 
changes are docs--I rewrote those to improve the wording some.



Thanks for the correction. Applied.
  


By the way: the pgbench.sgml that you committed looks like it passed 
through a system that added a CR to every line in it.  Probably not the 
way you intended to commit that.


So far I've done over 40 hours of pgbench runtime worth of testing (500 
runs) using the patched version without any issues, the code itself 
continues to act fine.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ragged latency log data in multi-threaded pgbench

2010-03-22 Thread Takahiro Itagaki

Greg Smith g...@2ndquadrant.com wrote:

 By the way: the pgbench.sgml that you committed looks like it passed 
 through a system that added a CR to every line in it.  Probably not the 
 way you intended to commit that.

Oops, fixed. Thanks.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ragged latency log data in multi-threaded pgbench

2010-03-18 Thread Greg Smith

Takahiro Itagaki wrote:

The log filenames are pgbench_log.main-process-id.thread-serial-number
for each thread, but the first thread (including single-threaded) still uses
pgbench_log.main-process-id for the name because of compatibility.
  


Attached is an updated version that I think is ready to commit.  Only 
changes are docs--I rewrote those to improve the wording some.  The code 
looked and tested fine to me.  I just added support for the new format 
to pgbench-tools and am back to happily running large batches of tests 
using it again.


I confirmed a few things:

-On my CentOS system, the original problem is masked if you have 
--enable-thread-safety on; the multi-threaded output shows up without 
any broken lines into the single file.  As I suspected it's only the 
multi-process implementation that shows the issue here.  Since Tom 
points out that's luck rather than something that should be relied upon, 
I don't think that actually changes what to do here, it just explains 
why this wasn't obvious in earlier testing--normally I have thread 
safety on nowadays.


-Patch corrects the problem.  I took a build without thread safety on, 
demonstrated the issue with its pgbench.  Apply the patch, rebuild just 
pgbench, run again; new multiple log files have no issue.


-It's easy to convert existing scripts to utilize the new multiple log 
format.  Right now the current idiom you're forced into using when 
running pgbench scripts is to track the PID it's run as, then use 
something like:


mv pgbench_log.${PID} pgbench.log

To convert to a stable filename for later processing.  Now you just use 
something like this instead:


cat pgbench_log.${PID}*  pgbench.log
rm -f pgbench_log.${PID}*

And that works fine. 


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 0019db4..28a8c84 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -131,11 +131,9 @@ int			fillfactor = 100;
 #define ntellers	10
 #define naccounts	10
 
-FILE	   *LOGFILE = NULL;
-
 bool		use_log;			/* log transaction latencies to a file */
-
-int			is_connect;			/* establish connection for each transaction */
+bool		is_connect;			/* establish connection for each transaction */
+int			main_pid;			/* main process id used in log filename */
 
 char	   *pghost = ;
 char	   *pgport = ;
@@ -183,6 +181,7 @@ typedef struct
  */
 typedef struct
 {
+	int			tid;			/* thread id */
 	pthread_t	thread;			/* thread handle */
 	CState	   *state;			/* array of CState */
 	int			nstate;			/* length of state[] */
@@ -741,7 +740,7 @@ clientDone(CState *st, bool ok)
 
 /* return false iff client should be disconnected */
 static bool
-doCustom(CState *st, instr_time *conn_time)
+doCustom(CState *st, instr_time *conn_time, FILE *log_file)
 {
 	PGresult   *res;
 	Command   **commands;
@@ -778,7 +777,7 @@ top:
 		/*
 		 * transaction finished: record the time it took in the log
 		 */
-		if (use_log  commands[st-state + 1] == NULL)
+		if (log_file  commands[st-state + 1] == NULL)
 		{
 			instr_time	now;
 			instr_time	diff;
@@ -791,12 +790,12 @@ top:
 
 #ifndef WIN32
 			/* This is more than we really ought to know about instr_time */
-			fprintf(LOGFILE, %d %d %.0f %d %ld %ld\n,
+			fprintf(log_file, %d %d %.0f %d %ld %ld\n,
 	st-id, st-cnt, usec, st-use_file,
 	(long) now.tv_sec, (long) now.tv_usec);
 #else
 			/* On Windows, instr_time doesn't provide a timestamp anyway */
-			fprintf(LOGFILE, %d %d %.0f %d 0 0\n,
+			fprintf(log_file, %d %d %.0f %d 0 0\n,
 	st-id, st-cnt, usec, st-use_file);
 #endif
 		}
@@ -857,7 +856,7 @@ top:
 		INSTR_TIME_ACCUM_DIFF(*conn_time, end, start);
 	}
 
-	if (use_log  st-state == 0)
+	if (log_file  st-state == 0)
 		INSTR_TIME_SET_CURRENT(st-txn_begin);
 
 	if (commands[st-state]-type == SQL_COMMAND)
@@ -1833,7 +1832,7 @@ main(int argc, char **argv)
 }
 break;
 			case 'C':
-is_connect = 1;
+is_connect = true;
 break;
 			case 's':
 scale_given = true;
@@ -1955,6 +1954,12 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
+	/*
+	 * save main process id in the global variable because process id will be
+	 * changed after fork.
+	 */
+	main_pid = (int) getpid();
+
 	if (nclients  1)
 	{
 		state = (CState *) realloc(state, sizeof(CState) * nclients);
@@ -1980,20 +1985,6 @@ main(int argc, char **argv)
 		}
 	}
 
-	if (use_log)
-	{
-		char		logpath[64];
-
-		snprintf(logpath, 64, pgbench_log.%d, (int) getpid());
-		LOGFILE = fopen(logpath, w);
-
-		if (LOGFILE == NULL)
-		{
-			fprintf(stderr, Couldn't open logfile \%s\: %s, logpath, strerror(errno));
-			exit(1);
-		}
-	}
-
 	if (debug)
 	{
 		if (duration = 0)
@@ -2111,6 +2102,7 @@ main(int argc, char **argv)
 	threads = (TState *) malloc(sizeof(TState) * nthreads);
 	for (i = 0; i  nthreads; i++)
 	{
+		threads[i].tid = i;
 		threads[i].state = state[nclients / nthreads 

[HACKERS] Ragged latency log data in multi-threaded pgbench

2010-03-15 Thread Greg Smith
Just noticed a problem popping up sometimes with the new multi-threaded 
pgbench.  This is on a Linux RPM build (the alpha4 set) compiled with 
'--disable-thread-safety'.  Still trying to nail down whether that's a 
requirement for this problem to appear or not.  I did most of my review 
of this feature with it turned on, and haven't been seeing this problem 
on other systems that are thread safe.  Not sure yet if that's cause and 
effect or coincidence yet.


Here's a sample invocation that produces ragged output for me on my one 
system:


pgbench -S -T 5 -c 4 -j 4 -l pgbench

The log file produced by this (pgbench_log.pid) is supposed to consist 
of a series of lines in the following format:


client,trans,latency,filenum,sec,usec

It looks like the switch between clients running on separate workers can 
lead to a mix of their respective lines showing up though.  Here's a 
couple of typical samples, with the bad line in the middle of each set:


1 138 178 0 1268665788 607559
1 139 182 0 1268665788 607751
1 1402 0 2491 0 1268665788 586135
2 1 264 0 1268665788 586463
2 2 192 0 1268665788 586665

1 274 160 0 1268665788 632966
1 275 178 0 1268665788 633154
1 276 184 0 126866578 178 0 1268665788 614015
2 141 190 0 1268665788 614252
2 142 169 0 1268665788 614430

2 274 178 0 1268665788 639218
2 275 175 0 1268665788 639402
2 276 169 0 126866578 171 0 1268665788 626933
0 141 185 0 1268665788 627165
0 142 202 0 1268665788 627377

Looks like sometimes a client is only getting part of its line written 
out before getting stomped on by the next one.  I think one of the 
assumptions being made about how to safely write to this log file may be 
broken by the multi-process implementation, which is what you get when 
thread-safety is not available.


Since there should only be 6 fields here, I think you can find whether a 
given log file has this problem or not like this:


cat pgbench_log.x | cut -d   -f 7 | sort | uniq

If anything comes out of that, the latency log file has at least one bad 
line in it.


Similarly, this:

cat pgbench_log.x | cut -d   -f 1 | sort | uniq

Should only show the client numbers; here there's some first columns 
with much bigger numbers too.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ragged latency log data in multi-threaded pgbench

2010-03-15 Thread Andrew Dunstan



Greg Smith wrote:
Just noticed a problem popping up sometimes with the new 
multi-threaded pgbench.  This is on a Linux RPM build (the alpha4 set) 
compiled with '--disable-thread-safety'.  Still trying to nail down 
whether that's a requirement for this problem to appear or not.  I did 
most of my review of this feature with it turned on, and haven't been 
seeing this problem on other systems that are thread safe.  Not sure 
yet if that's cause and effect or coincidence yet.





We had to turn handsprings to prevent this sort of effect with the 
logging collector, which was a requirement of being able to implement 
CSV logging sanely. So I'm not surprised by this report.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ragged latency log data in multi-threaded pgbench

2010-03-15 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes:
 Looks like sometimes a client is only getting part of its line written 
 out before getting stomped on by the next one.  I think one of the 
 assumptions being made about how to safely write to this log file may be 
 broken by the multi-process implementation, which is what you get when 
 thread-safety is not available.

pgbench doesn't make any effort at all to avoid interleaved writes on
that file.  I don't think there is anything much that can be done about
it when you are using the forked-processes implementation.  It's
probably possible for it to show up on the multi-threads version too,
depending on how hard libc tries to interlock stdio calls.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ragged latency log data in multi-threaded pgbench

2010-03-15 Thread Josh Berkus
On 3/15/10 8:41 AM, Greg Smith wrote:
 Just noticed a problem popping up sometimes with the new multi-threaded
 pgbench.  This is on a Linux RPM build (the alpha4 set) compiled with
 '--disable-thread-safety'.  Still trying to nail down whether that's a
 requirement for this problem to appear or not.  I did most of my review
 of this feature with it turned on, and haven't been seeing this problem
 on other systems that are thread safe.  Not sure yet if that's cause and
 effect or coincidence yet.

For my part, telling people that multi-thread pgbench doesn't work
correctly on systems which are not thread-safe seems perfectly OK.


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ragged latency log data in multi-threaded pgbench

2010-03-15 Thread Takahiro Itagaki

Greg Smith g...@2ndquadrant.com wrote:

 It looks like the switch between clients running on separate workers can 
 lead to a mix of their respective lines showing up though.

Oops. There might be two solutions for the issue:
  1. Use explicit locks. The lock primitive will be pthread_mutex for
 multi-threaded implementations or semaphore for multi-threaded ones.
  2. Use per-thread log files.
 File names would be pgbench_log.main-process-id.thread-id.

Which is better, or another idea?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ragged latency log data in multi-threaded pgbench

2010-03-15 Thread Greg Smith

Takahiro Itagaki wrote:

  1. Use explicit locks. The lock primitive will be pthread_mutex for
 multi-threaded implementations or semaphore for multi-threaded ones.
  2. Use per-thread log files.
 File names would be pgbench_log.main-process-id.thread-id.
  


I'm concerned that the locking itself will turn into a new pgbench 
bottleneck, just as we're clearing the point where it's not for the 
first time in a while.  And that sounds like it has its own potential 
risks/complexity involved.


I could live with per-thread log files.  I think my pgbench-tools is the 
main consumer of these latency logs floating around right now, I just 
pushed a 9.0 update to handle the multiple workers option today that 
discovered this).  It doesn't make any difference to what I'm doing how 
many file I have to process.  Just a few lines of extra shell code for 
me to pull the rest into the import.  That seems like the simplest 
solution that's guaranteed to work, just push the problem onto the 
client side instead where it's easier to deal with.


Unless someone feels strongly that these have to be interleaved into one 
file, based on Andrew's suggestion that this is a hard problem to get 
right and Tom's suggestion that this might even extend into the proper 
threaded version too, I think a log file per worker is the easiest way 
out of this.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ragged latency log data in multi-threaded pgbench

2010-03-15 Thread Tom Lane
Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp writes:
 Greg Smith g...@2ndquadrant.com wrote:
 It looks like the switch between clients running on separate workers can 
 lead to a mix of their respective lines showing up though.

 Oops. There might be two solutions for the issue:
   1. Use explicit locks. The lock primitive will be pthread_mutex for
  multi-threaded implementations or semaphore for multi-threaded ones.
   2. Use per-thread log files.
  File names would be pgbench_log.main-process-id.thread-id.

I think #1 is out of the question, as the synchronization overhead will
do serious damage to the whole point of having a multithreaded pgbench.
#2 might be a reasonable idea.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ragged latency log data in multi-threaded pgbench

2010-03-15 Thread Takahiro Itagaki

Tom Lane t...@sss.pgh.pa.us wrote:

 Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp writes:
  Oops. There might be two solutions for the issue:
1. Use explicit locks. The lock primitive will be pthread_mutex for
   multi-threaded implementations or semaphore for multi-threaded ones.
2. Use per-thread log files.
   File names would be pgbench_log.main-process-id.thread-id.
 
 I think #1 is out of the question, as the synchronization overhead will
 do serious damage to the whole point of having a multithreaded pgbench.
 #2 might be a reasonable idea.

Ok, I'll go for #2.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ragged latency log data in multi-threaded pgbench

2010-03-15 Thread Takahiro Itagaki

Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp wrote:

 2. Use per-thread log files.
File names would be pgbench_log.main-process-id.thread-id.

Here is a patch to implement per-thread log files for pgbench -l.

The log filenames are pgbench_log.main-process-id.thread-serial-number
for each thread, but the first thread (including single-threaded) still uses
pgbench_log.main-process-id for the name because of compatibility.

Example:
  $ pgbench -c16 -j4 -l
  $ ls
  pgbench_log.2196  pgbench_log.2196.1  pgbench_log.2196.2  pgbench_log.2196.3

Comments and suggenstions welcome.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



pgbench_log_20100316.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers