Re: pgbench and timestamps (bounced)

2021-01-13 Thread Tom Lane
Fabien COELHO  writes:
>> What remains that we could do now, and perhaps back-patch, is point (2)
>> i.e. disallow digits as the first character of a pgbench variable name.

> I'm fine with that.

>> That would be enough to "solve" the original bug report, and it does seem
>> like it could be back-patched, while we're certainly not going to risk
>> back-patching anything as portability-fraught as introducing mutexes.

> Sure.

OK.  I've pushed a patch that just does that much, and marked the
commitfest entry closed.  After the other thing lands, please rebase
and resubmit what remains of this patch.

regards, tom lane




Re: pgbench and timestamps (bounced)

2021-01-13 Thread Fabien COELHO



Hello Tom,


Hi, this entry is "Waiting on Author" and the thread was inactive for a
while. I see this discussion still has some open questions. Are you
going to continue working on it, or should I mark it as "returned with
feedback" until a better time?



IMHO the proposed fix is reasonable and addresses the issue.
I have responded to Tom's remarks about it, and it is waiting for his
answer to my counter arguments. So ISTM that the patch is currently still
waiting for some feedback.


It looks like my unhappiness with injecting a pthread dependency into
pgbench is going to be overtaken by events in the "option delaying
queries" thread [1].  However, by the same token there are some conflicts
between the two patchsets, and also I prefer the other thread's approach
to portability (i.e. do it honestly, not with a private portability layer
in pgbench.c).  So I'm inclined to put the parts of this patch that
require pthreads on hold till that lands.


Ok. This is fair enough. Portability is a pain thanks to Windows vs MacOS 
vs Linux approaches of implementing or not a standard.



What remains that we could do now, and perhaps back-patch, is point (2)
i.e. disallow digits as the first character of a pgbench variable name.


I'm fine with that.


That would be enough to "solve" the original bug report, and it does seem
like it could be back-patched, while we're certainly not going to risk
back-patching anything as portability-fraught as introducing mutexes.


Sure.

I'm unable to do much pg work at the moment, but this should be eased 
quite soon.



[1] 
https://www.postgresql.org/message-id/flat/20200227180100.zyvjwzcpiokfsqm2%40alap3.anarazel.de


--
Fabien Coelho - CRI, MINES ParisTech




Re: pgbench and timestamps (bounced)

2021-01-12 Thread Tom Lane
Fabien COELHO  writes:
>> Hi, this entry is "Waiting on Author" and the thread was inactive for a 
>> while. I see this discussion still has some open questions. Are you 
>> going to continue working on it, or should I mark it as "returned with 
>> feedback" until a better time?

> IMHO the proposed fix is reasonable and addresses the issue.
> I have responded to Tom's remarks about it, and it is waiting for his 
> answer to my counter arguments. So ISTM that the patch is currently still 
> waiting for some feedback.

It looks like my unhappiness with injecting a pthread dependency into
pgbench is going to be overtaken by events in the "option delaying
queries" thread [1].  However, by the same token there are some conflicts
between the two patchsets, and also I prefer the other thread's approach
to portability (i.e. do it honestly, not with a private portability layer
in pgbench.c).  So I'm inclined to put the parts of this patch that
require pthreads on hold till that lands.

What remains that we could do now, and perhaps back-patch, is point (2)
i.e. disallow digits as the first character of a pgbench variable name.
That would be enough to "solve" the original bug report, and it does seem
like it could be back-patched, while we're certainly not going to risk
back-patching anything as portability-fraught as introducing mutexes.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/20200227180100.zyvjwzcpiokfsqm2%40alap3.anarazel.de




Re: pgbench and timestamps (bounced)

2020-11-26 Thread Fabien COELHO




CFM reminder.

Hi, this entry is "Waiting on Author" and the thread was inactive for a 
while. I see this discussion still has some open questions. Are you 
going to continue working on it, or should I mark it as "returned with 
feedback" until a better time?


IMHO the proposed fix is reasonable and addresses the issue.

I have responded to Tom's remarks about it, and it is waiting for his 
answer to my counter arguments. So ISTM that the patch is currently still 
waiting for some feedback.


--
Fabien.




Re: pgbench and timestamps (bounced)

2020-11-24 Thread Anastasia Lubennikova

On 11.09.2020 16:59, Fabien COELHO wrote:


Hello Tom,


It requires a mutex around the commands, I tried to do some windows
implementation which may or may not work.


Ugh, I'd really rather not do that.  Even disregarding the effects
of a mutex, though, my initial idea for fixing this has a big problem:
if we postpone PREPARE of the query until first execution, then it's
happening during timed execution of the benchmark scenario and thus
distorting the timing figures.  (Maybe if we'd always done it like
that, it'd be okay, but I'm quite against changing the behavior now
that it's stood for a long time.)


Hmmm.

Prepare is done *once* per client, ISTM that the impact on any 
statistically significant benchmark is nul in practice, or it would 
mean that the benchmark settings are too low.


Second, the mutex is only used when absolutely necessary, only for the 
substitution part of the query (replacing :stuff by ?), because 
scripts are shared between threads. This is just once, in an unlikely 
case occuring at the beginning.



However, perhaps there's more than one way to fix this.  Once we've
scanned all of the script and seen all the \set commands, we know
(in principle) the set of all variable names that are in use.
So maybe we could fix this by

(1) During the initial scan of the script, make variable-table
entries for every \set argument, with the values shown as undefined
for the moment.  Do not try to parse SQL commands in this scan,
just collect them.


The issue with this approach is

  SELECT 1 AS one \gset pref_

which will generate a "pref_one" variable, and these names cannot be 
guessed without SQL parsing and possibly execution. That is why the

preparation is delayed to when the variables are actually known.


(2) Make another scan in which we identify variable references
in the SQL commands and issue PREPAREs (if enabled).



(3) Perform the timed run.

This avoids any impact of this bug fix on the semantics or timing
of the benchmark proper.  I'm not sure offhand whether this
approach makes any difference for the concerns you had about
identifying/suppressing variable references inside quotes.


I do not think this plan is workable, because of the \gset issue.

I do not see that the conditional mutex and delayed PREPARE would have 
any significant (measurable) impact on an actual (reasonable) 
benchmark run.


A workable solution would be that each client actually execute each 
script once before starting the actual benchmark. It would still need 
a mutex and also a sync barrier (which I'm proposing in some other 
thread). However this may raise some other issues because then some 
operations would be trigger out of the benchmarking run, which may or 
may not be desirable.


So I'm not to keen to go that way, and I think the proposed solution 
is reasonable from a benchmarking point of view as the impact is 
minimal, although not zero.



CFM reminder.

Hi, this entry is "Waiting on Author" and the thread was inactive for a 
while. I see this discussion still has some open questions. Are you 
going to continue working on it, or should I mark it as "returned with 
feedback" until a better time?


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: pgbench and timestamps (bounced)

2020-09-11 Thread Fabien COELHO



Hello Tom,


It requires a mutex around the commands, I tried to do some windows
implementation which may or may not work.


Ugh, I'd really rather not do that.  Even disregarding the effects
of a mutex, though, my initial idea for fixing this has a big problem:
if we postpone PREPARE of the query until first execution, then it's
happening during timed execution of the benchmark scenario and thus
distorting the timing figures.  (Maybe if we'd always done it like
that, it'd be okay, but I'm quite against changing the behavior now
that it's stood for a long time.)


Hmmm.

Prepare is done *once* per client, ISTM that the impact on any 
statistically significant benchmark is nul in practice, or it would mean 
that the benchmark settings are too low.


Second, the mutex is only used when absolutely necessary, only for the 
substitution part of the query (replacing :stuff by ?), because scripts 
are shared between threads. This is just once, in an unlikely case 
occuring at the beginning.



However, perhaps there's more than one way to fix this.  Once we've
scanned all of the script and seen all the \set commands, we know
(in principle) the set of all variable names that are in use.
So maybe we could fix this by

(1) During the initial scan of the script, make variable-table
entries for every \set argument, with the values shown as undefined
for the moment.  Do not try to parse SQL commands in this scan,
just collect them.


The issue with this approach is

  SELECT 1 AS one \gset pref_

which will generate a "pref_one" variable, and these names cannot be 
guessed without SQL parsing and possibly execution. That is why the

preparation is delayed to when the variables are actually known.


(2) Make another scan in which we identify variable references
in the SQL commands and issue PREPAREs (if enabled).



(3) Perform the timed run.

This avoids any impact of this bug fix on the semantics or timing
of the benchmark proper.  I'm not sure offhand whether this
approach makes any difference for the concerns you had about
identifying/suppressing variable references inside quotes.


I do not think this plan is workable, because of the \gset issue.

I do not see that the conditional mutex and delayed PREPARE would have any 
significant (measurable) impact on an actual (reasonable) benchmark run.


A workable solution would be that each client actually execute each script 
once before starting the actual benchmark. It would still need a mutex and 
also a sync barrier (which I'm proposing in some other thread). However 
this may raise some other issues because then some operations would be 
trigger out of the benchmarking run, which may or may not be desirable.


So I'm not to keen to go that way, and I think the proposed solution is 
reasonable from a benchmarking point of view as the impact is minimal, 
although not zero.


--
Fabien.




Re: pgbench and timestamps (bounced)

2020-09-07 Thread Tom Lane
Fabien COELHO  writes:
> [Resent on hackers for CF registration, sorry for the noise]

For the record, the original thread is at

https://www.postgresql.org/message-id/flat/CAKVUGgQaZVAUi1Ex41H4wrru%3DFU%2BMfwgjG0aM1br6st7sz31Vw%40mail.gmail.com

(I tried but failed to attach that thread to the CF entry, so we'll
have to settle for leaving a breadcrumb in this thread.)

> It requires a mutex around the commands, I tried to do some windows 
> implementation which may or may not work.

Ugh, I'd really rather not do that.  Even disregarding the effects
of a mutex, though, my initial idea for fixing this has a big problem:
if we postpone PREPARE of the query until first execution, then it's
happening during timed execution of the benchmark scenario and thus
distorting the timing figures.  (Maybe if we'd always done it like
that, it'd be okay, but I'm quite against changing the behavior now
that it's stood for a long time.)

However, perhaps there's more than one way to fix this.  Once we've
scanned all of the script and seen all the \set commands, we know
(in principle) the set of all variable names that are in use.
So maybe we could fix this by

(1) During the initial scan of the script, make variable-table
entries for every \set argument, with the values shown as undefined
for the moment.  Do not try to parse SQL commands in this scan,
just collect them.

(2) Make another scan in which we identify variable references
in the SQL commands and issue PREPAREs (if enabled).

(3) Perform the timed run.

This avoids any impact of this bug fix on the semantics or timing
of the benchmark proper.  I'm not sure offhand whether this
approach makes any difference for the concerns you had about
identifying/suppressing variable references inside quotes.

regards, tom lane




Re: pgbench and timestamps (bounced)

2020-07-08 Thread Fabien COELHO


Attached v2 fixes some errors, per cfbot.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 9f3bb5fce6..9894ae9c04 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -998,15 +998,14 @@ pgbench  options  d
   
There is a simple variable-substitution facility for script files.
Variable names must consist of letters (including non-Latin letters),
-   digits, and underscores.
+   digits (but not on the first character), and underscores.
Variables can be set by the command-line -D option,
explained above, or by the meta commands explained below.
In addition to any variables preset by -D command-line options,
there are a few variables that are preset automatically, listed in
. A value specified for these
variables using -D takes precedence over the automatic presets.
-   Once set, a variable's
-   value can be inserted into a SQL command by writing
+   Once set, a variable's value can be inserted into a SQL command by writing
:variablename.  When running more than
one client session, each session has its own set of variables.
pgbench supports up to 255 variable uses in one
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 08a5947a9e..3879bef6da 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -119,12 +119,24 @@ typedef int pthread_attr_t;
 
 static int	pthread_create(pthread_t *thread, pthread_attr_t *attr, void *(*start_routine) (void *), void *arg);
 static int	pthread_join(pthread_t th, void **thread_return);
+
+typedef HANDLE pthread_mutex_t;
+static void pthread_mutex_init(pthread_mutex_t *pm, void *unused);
+static void pthread_mutex_lock(pthread_mutex_t *pm);
+static void pthread_mutex_unlock(pthread_mutex_t *pm);
+static void pthread_mutex_destroy(pthread_mutex_t *pm);
+
 #elif defined(ENABLE_THREAD_SAFETY)
 /* Use platform-dependent pthread capability */
 #include 
 #else
 /* No threads implementation, use none (-j 1) */
 #define pthread_t void *
+#define pthread_mutex_t void *
+#define pthread_mutex_init(m, p)
+#define pthread_mutex_lock(m)
+#define pthread_mutex_unlock(m)
+#define pthread_mutex_destroy(m)
 #endif
 
 
@@ -422,7 +434,7 @@ typedef struct
 	instr_time	txn_begin;		/* used for measuring schedule lag times */
 	instr_time	stmt_begin;		/* used for measuring statement latencies */
 
-	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
+	bool*		prepared[MAX_SCRIPTS];	/* whether client prepared the script commands */
 
 	/* per client collected stats */
 	int64		cnt;			/* client transaction count, for -t */
@@ -472,7 +484,7 @@ typedef struct
  */
 #define MAX_ARGS		256
 
-typedef enum MetaCommand
+typedef enum Meta
 {
 	META_NONE,	/* not a known meta-command */
 	META_SET,	/* \set */
@@ -503,6 +515,8 @@ static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
  *
  * lines		The raw, possibly multi-line command text.  Variable substitution
  *not applied.
+ * substituted	Whether command	:-variables were substituted for
+ *QUERY_EXTENDED and QUERY_PREPARED
  * first_line	A short, single-line extract of 'lines', for error reporting.
  * type			SQL_COMMAND or META_COMMAND
  * meta			The type of meta-command, with META_NONE/GSET/ASET if command
@@ -517,10 +531,12 @@ static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
  * aset			do gset on all possible queries of a combined query (\;).
  * expr			Parsed expression, if needed.
  * stats		Time spent in this command.
+ * mutex		For delayed initializations of SQL commands.
  */
 typedef struct Command
 {
 	PQExpBufferData lines;
+	bool		substituted;
 	char	   *first_line;
 	int			type;
 	MetaCommand meta;
@@ -529,6 +545,7 @@ typedef struct Command
 	char	   *varprefix;
 	PgBenchExpr *expr;
 	SimpleStats stats;
+	pthread_mutex_t	mutex;
 } Command;
 
 typedef struct ParsedScript
@@ -613,6 +630,7 @@ static void clear_socket_set(socket_set *sa);
 static void add_socket_to_set(socket_set *sa, int fd, int idx);
 static int	wait_on_socket_set(socket_set *sa, int64 usecs);
 static bool socket_has_input(socket_set *sa, int fd, int idx);
+static bool makeVariablesParameters(CState *st, Command *cmd);
 
 
 /* callback functions for our flex lexer */
@@ -1273,7 +1291,7 @@ lookupVariable(CState *st, char *name)
 
 /* Get the value of a variable, in string form; returns NULL if unknown */
 static char *
-getVariable(CState *st, char *name)
+getVariable(CState *st, char *name, bool *isnull)
 {
 	Variable   *var;
 	char		stringform[64];
@@ -1282,6 +1300,9 @@ getVariable(CState *st, char *name)
 	if (var == NULL)
 		return NULL;			/* not found */
 
+	if (isnull != NULL)
+		*isnull = var->value.type == PGBT_NULL;
+
 	if (var->svalue)
 		return var->svalue;		/* we have it in string form */
 
@@ -1377,6 +1398,9 @@ makeVariableValue(Variable *var)
  * "src/bin/pgbench/exprscan.l".  Also see parseVariable(), below.
  *
  * Note: this static function 

Re: pgbench and timestamps (bounced)

2020-06-29 Thread Fabien COELHO


[Resent on hackers for CF registration, sorry for the noise]

Hello Tom,

The attached patch fixes some of the underlying problems reported by delaying 
the :var to $1 substitution to the last possible moments, so that what 
variables are actually defined is known. PREPARE-ing is also delayed to after 
these substitutions are done.


It requires a mutex around the commands, I tried to do some windows 
implementation which may or may not work.


The attached patch fixes (2) & (3) for extended & prepared.

I have a doubt about fixing (1) because it would be a significant behavioral 
change and it requires changing the replace routine significantly to check for 
quoting, comments, and so on. This means that currently ':var' is still broken 
under -M extended & prepared, I could only break it differently by providing a 
nicer error message and also break it under simple whereas it currently works 
there. I'm not thrilled by spending efforts to do that.


The patches change the name of "parseQuery" to "makeVariablesParameters", 
because it was not actually parsing any query. Maybe the new name could be 
improved.


In passing, there was a bug in how NULL was passed, which I tried to fix
as well.


I don't often do much with pgbench and variables, but there are a few
things that surprise me here.
1) That pgbench replaces variables within single quotes, and;
2) that we still think it's a variable name when it starts with a digit, 
and;

3) We replace variables that are undefined.



Also (4) this only happens when in non-simple query mode --- the
example works fine without "-M prepared".


After looking around in the code, it seems like the core of the issue
is that pgbench.c's parseQuery() doesn't check whether a possible
variable name is actually defined, unlike assignVariables() which is
what does the same job in simple query mode.  So that explains the
behavioral difference.


Yes.


The reason for doing that probably was that parseQuery() is run when
the input file is read, so that relevant variables might not be set
yet.  We could fix that by postponing the work to be done at first
execution of the query, as is already the case for PQprepare'ing the
query.


Yep, done at first execution of the Command, so that variables are known.


Also, after further thought I realize that (1) absolutely is a bug
in the non-simple query modes, whatever you think about it in simple
mode.  The non-simple modes are trying to pass the variable values
as extended-query-protocol parameters, and the backend is not going
to recognize $n inside a literal as being a parameter.


Yep. See my comments above.


If we fixed (1) and (3) I think there wouldn't be any great need
to tighten up (2).


I did (2) but not (1), for now.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 8e728dc094..7436210fd4 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -998,15 +998,14 @@ pgbench  options  d
   
There is a simple variable-substitution facility for script files.
Variable names must consist of letters (including non-Latin letters),
-   digits, and underscores.
+   digits (but not on the first character), and underscores.
Variables can be set by the command-line -D option,
explained above, or by the meta commands explained below.
In addition to any variables preset by -D command-line options,
there are a few variables that are preset automatically, listed in
. A value specified for these
variables using -D takes precedence over the automatic presets.
-   Once set, a variable's
-   value can be inserted into a SQL command by writing
+   Once set, a variable's value can be inserted into a SQL command by writing
:variablename.  When running more than
one client session, each session has its own set of variables.
pgbench supports up to 255 variable uses in one
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 08a5947a9e..09ccf05db5 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -119,12 +119,24 @@ typedef int pthread_attr_t;
 
 static int	pthread_create(pthread_t *thread, pthread_attr_t *attr, void *(*start_routine) (void *), void *arg);
 static int	pthread_join(pthread_t th, void **thread_return);
+
+typedef HANDLE pthread_mutex_t;
+static void pthread_mutex_init(pthread_mutex_t *pm, void *unused);
+static void pthread_mutex_lock(pthread_mutex_t *pm);
+static void pthread_mutex_unlock(pthread_mutex_t *pm);
+static void pthread_mutex_destroy(pthread_mutex_t *pm);
+
 #elif defined(ENABLE_THREAD_SAFETY)
 /* Use platform-dependent pthread capability */
 #include 
 #else
 /* No threads implementation, use none (-j 1) */
 #define pthread_t void *
+#define pthread_mutex_t void *
+#define pthread_mutex_init(m, p)
+#define pthread_mutex_lock(m)
+#define pthread_mutex_unlock(m)
+#define pthread_mutex_destroy(m)
 #endif
 
 
@@ -422,7 +434,7 @@ typedef struct
 	instr_time	txn_begin;		/* used