Re: [HACKERS] pgbench \for or similar loop

Greg Smith Thu, 21 Apr 2011 23:23:44 -0700

Alvaro Herrera wrote:

Why do we have pgbench at all in the first place?  Surely we could
rewrite it in plpgsql with proper stored procedures.


pgbench gives you a driver program with the following useful properties:

1) Multiple processes are spawned and each gets its own connection

2) A time/transaction limit is enforced across all of the connections atonce

3) Timing information is written to a client-side log file

4) The work of running the clients can happen on a remote system, sothat it's possible to just test the server-side performance5) The program is similar enough to any other regular client, using thestandard libpq interface, that connection-related overhead should besimilar to a real workload.

All of those have some challenges before you could duplicate them in astored procedure context.

My opinion of this feature is similar to the one Aiden alreadyexpressed: there's already so many ways to do this sort of thing usingshell-oriented approaches (as well as generate_series) that it's hard toget too excited about implementing it directly in pgbench. Part of thereason for adding the \shell and \setshell commands way to make trickythings like this possible without having to touch the pgbench codefurther. I for example would solve the problem you're facing like this:


1) Write a shell script that generates the file I need

2) Call it from pgbench using \shell, passing the size it needs. Havethat write a delimited file with the data required.

3) Import the whole thing with COPY.

And next thing you know you've even got the CREATE/COPY optimization asa possibility to avoid WAL, as well as the ability to avoid creating thedata file more than once if the script is smart enough.

Sample data file generation can be difficult; most of the time I'drather solve in a general programming language. The fact that simplegeneration cases could be done with the mechanism you propose is true.However, this only really helps cases that are too complicated toexpress with generate_series, yet not so complicated that you reallywant a full programming language to generate the data. I don't thinkthere's that much middle ground in that use case.

But if this is what you think makes your life easier, I'm not going totell you you're wrong. And I don't feel that your desire for thisfeatures means you must tackle a more complicated thing instead--eventhough what I personally would much prefer is something making this sortof thing easier to do in regression tests, too. That's a harderproblem, though, and you're only volunteering to solve an easier onethan that.

Stepping aside from debate over usefulness, my main code concern is thateach time I look at the pgbench code for yet another tacked on bit, it'sgetting increasingly creakier and harder to maintain. It's never goingto be a good benchmark driver program capable of really complicatedtasks. And making it try keeps piling on the risk of breaking it forits intended purpose of doing simple tests. If you can figure out howto keep the code contortions to implement the feature under control,there's some benefit there. I can't think of a unique reason for it;again, lots of ways to solve this already. But I'd probably use it ifit were there.


--
Greg Smith   2ndQuadrant US    [email protected]   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench \for or similar loop

Reply via email to