subject:"\[GENERAL\] psql vs perl prepared inserts"

Re: [GENERAL] psql vs perl prepared inserts

2005-04-14 Thread Mike Rylander

Just to add some clarity to the Dawid's fine comments:

On 4/14/05, Dawid Kuroczko <[EMAIL PROTECTED]> wrote:
> On 4/13/05, Matt Van Mater <[EMAIL PROTECTED]> wrote:
> I believe that COPY handles things like unique constraints and
> referer integerity just fine (of the latter I am not sure) -- well,
> its hard to imagine PostgreSQL allowing itself to loose integrity
> of the data, don't you agree?
> 

I do, and so does PG. :)  Everything that would normally happen with
an INSERT, including firing TRIGGERs, checking FKEYs, setting
DEFAULTs, running CHECK constraints -- everything -- happens with a
COPY import.

The *one and only* exception to the above statement is that RULEs to
not get used on COPY.  The reason for this is that RULEs rewrite
queries and after the COPY FROM header there isn't a query to rewrite,
it's just a pile of data.

This is in the docs, but it's not easy to find.  It's actually just
one line in the "Notes" section of
http://www.postgresql.org/docs/8.0/static/sql-copy.html .  Perhaps we
should add some more verbiage (he says, non-voluntarily...)?

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [GENERAL] psql vs perl prepared inserts

2005-04-14 Thread Dawid Kuroczko

On 4/13/05, Matt Van Mater <[EMAIL PROTECTED]> wrote:
> > The intent of prepared statements is to reduce the overhead of running
> > the parser, rewriter and planner multiple times for a statement that is
> > executed multiple times. For an INSERT query without any sub-selects
> > that is not rewritten by any rules, the cost to parse, rewrite and plan
> > the statement is trivial. So I wouldn't expect prepared statements to be
> > a big win -- you would gain a lot more from batching multiple inserts
> > into a single transaction, and more still from using COPY.
> I was thinking something along the same lines, and was considering
> using the COPY statement as my next step, but as someone mentioned
> then I have to load it into a temporary database and then do some more
> internal magic to preserve referential integrity, unique contraints,
> etc.  For that reason I was hoping to keep it in perl, and it's always
> nice to keep everything in a single neat portable package.

I believe that COPY handles things like unique constraints and
referer integerity just fine (of the latter I am not sure) -- well,
its hard to imagine PostgreSQL allowing itself to loose integrity
of the data, don't you agree?

A simple test:
qnex=# CREATE TABLE a (a int PRIMARY KEY);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"a_pkey" for table "a"
CREATE TABLEqnex=# COPY a FROM STDIN;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1
>> 2
>> 3
>> 1
>> \.
ERROR:  duplicate key violates unique constraint "a_pkey"
CONTEXT:  COPY a, line 4: "1"

...you may have to rethink your design if there are any RULEs or
TRIGGERs associated with said table.  Apart from that, there
should be no problems at the risk that one "bad" row will cancel
whole COPY commands, so if you trust your data, it will be fine. :)

   Regards,
 Dawid

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [GENERAL] psql vs perl prepared inserts

2005-04-13 Thread Alvaro Herrera

On Wed, Apr 13, 2005 at 09:57:09AM -0400, Matt Van Mater wrote:

> Also, I forgot to mention earlier that I tried using transactions to
> speed things up, but since I expect to see certain inserts fail I
> would need to rework my code so the whole transaction doesn't fail if
> one insert goes bad.  This is somewhat contrary to the purpose of
> transactions so I'm not sure how to accomplish this.

Try the pgloader project at pgfoundry.  It tries to insert all rows
using COPY and the rejected ones go to a file, using something akin to
binary search.  It may be of some use.  I haven't tried it so I'm not
sure of its maturity.  Let us know how it goes if you try it!

-- 
Alvaro Herrera (<[EMAIL PROTECTED]>)
"And as an added bonus, now my computer goes to the toilet for me, leaving me
free to spend time on more useful activities! yay slug codefests!" (C. Parker)

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [GENERAL] psql vs perl prepared inserts

2005-04-13 Thread Sean Davis

On Apr 13, 2005, at 9:57 AM, Matt Van Mater wrote:
Thanks to all who replied.  Thanks for the tip on that last thread
Tom, I don't know how I missed it.  I have a hunch that it's not
applicable to me at this time because I'm running a year and a half
old software (included in OpenBSD 3.4), but I will have to check which
version of DBD::Pg was installed.
The intent of prepared statements is to reduce the overhead of running
the parser, rewriter and planner multiple times for a statement that 
is
executed multiple times. For an INSERT query without any sub-selects
that is not rewritten by any rules, the cost to parse, rewrite and 
plan
the statement is trivial. So I wouldn't expect prepared statements to 
be
a big win -- you would gain a lot more from batching multiple inserts
into a single transaction, and more still from using COPY.
I was thinking something along the same lines, and was considering
using the COPY statement as my next step, but as someone mentioned
then I have to load it into a temporary database and then do some more
internal magic to preserve referential integrity, unique contraints,
etc.  For that reason I was hoping to keep it in perl, and it's always
nice to keep everything in a single neat portable package.
You can use pl/perl stored functions to do exactly what you do in perl 
on the client side, but it happens much faster since the data is 
already on the server.  Then, your neat little portable package IS the 
database.

Sean
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [GENERAL] psql vs perl prepared inserts

2005-04-13 Thread Tom Lane

Neil Conway <[EMAIL PROTECTED]> writes:
> The intent of prepared statements is to reduce the overhead of running 
> the parser, rewriter and planner multiple times for a statement that is 
> executed multiple times. For an INSERT query without any sub-selects 
> that is not rewritten by any rules, the cost to parse, rewrite and plan 
> the statement is trivial.

You'd be surprised ...

I was looking into this just the other day, with a test case that looks
like

 create table bench1 (id int NOT NULL,id2 int NOT NULL,id3 int NOT NULL,
  dummy1 char(30));
 create unique index bench1_index_ on bench1  (id,id2);
 create index bench1_index_1 on bench1  (id3);
 begin;
 insert into bench1 values (0,0,0,'ABCDEFGHIJ');
 ... 300,000 inserts ...
 insert into bench1 values (167151,167151,167151,'CDEFGHIJKL');
 commit;

According to gprof, the above-1% functions are

  %   cumulative   self  self total   
 time   seconds   secondscalls   s/call   s/call  name
  6.62 53.3853.38   37 0.00 0.00  yyparse
  5.31 96.1942.81 11808781 0.00 0.00  hash_search
  4.44132.0035.81 61268959 0.00 0.00  AllocSetAlloc
  4.23166.1334.13   902304 0.00 0.00  XLogInsert
  3.20191.9925.86 13688735 0.00 0.00  _bt_compare
  2.94215.7523.76 12158347 0.00 0.00  LWLockAcquire
  2.67237.3221.58  4500066 0.00 0.00  base_yylex
  2.56258.0020.68  6000510 0.00 0.00  SearchCatCache
  1.99274.0716.07 12160856 0.00 0.00  LWLockRelease
  1.88289.2515.18 13008925 0.00 0.00  hash_any
  1.49301.2512.01  2452386 0.00 0.00  PinBuffer
  1.36312.2511.00  1201324 0.00 0.00  fmgr_info_cxt_security
  1.36323.2410.98   30 0.00 0.00  planner
  1.19332.81 9.57 20700142 0.00 0.00  
MemoryContextAllocZeroAligned

I don't trust gprof's tree-structured breakdown entirely, but it puts
a pretty significant fraction of the blame on parse/plan activities:

3.66  767.69  37/37  PostgresMain [4]
[5] 95.63.66  767.69  37 exec_simple_query [5]
6.13  283.12  37/37  PortalRun [6]
0.48  167.39  37/37  pg_analyze_and_rewrite [9]
0.47  122.85  37/37  pg_plan_queries [16]
1.56   93.29  37/37  pg_parse_query [23]
0.62   34.78  37/37  pg_rewrite_queries [52]
0.99   17.39  37/37  PortalDrop [79]
0.56   16.26  600014/600014  finish_xact_command [84]
1.196.89  37/37  CreatePortal [126]
1.082.29  600014/600014  start_xact_command [186]
1.880.36  37/37  PortalStart [218]
0.761.44  37/37  pq_puttextmessage [220]
1.350.00  37/600017  set_ps_display [210]
1.270.00  37/37  CreateCommandTag [271]
0.890.22  37/37  printtup_create_DR [286]
0.740.00  37/37  CreateDestReceiver [328]
0.610.00  37/600015  pgstat_report_activity [277]
0.100.14  37/13864259 pfree [112]
0.230.00  37/37  PortalSetResultFormat [449]
0.190.00  37/37  IsAbortedTransactionBlockState 
[474]
0.070.00  37/37  printtup_destroy [564]
0.050.00  37/37  PortalDefineQuery [580]
0.030.00  37/37  EndCommand [617]
0.010.00  37/37  BeginCommand [670]

That adds up to over 50% of the runtime spent in parse/rewrite/plan.

I haven't gotten around to converting the test case into a program that
can use a prepared INSERT command, but it looks plausible to expect
a factor of 2 or so speedup ... of course, using COPY would completely
blow this away, anyway ...

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] psql vs perl prepared inserts

2005-04-13 Thread Daniel Verite

Neil Conway wrote:

> For an INSERT query without any sub-selects 
> that is not rewritten by any rules, the cost to parse, rewrite and plan 
> the statement is trivial. So I wouldn't expect prepared statements to be 
> a big win -- you would gain a lot more from batching multiple inserts 
> into a single transaction, and more still from using COPY.

FWIW, when testing pgstream [1] I typically see a 50% increase in execution
speed when switching to prepared statements in such a scenario.
I'm attaching a small test program that inserts 1 rows into 5 columns, first
without and then with prepared statements, and displays elapsed time.

Example of results:
  elapsed time in loop 0 is 1873 ms (PQexec)
  elapsed time in loop 1 is 1136 ms (PQexecPrepared)
That's with unix domain sockets and a 8.0.1 server.

[1] a thin C++ layer on top of libpq (http://manitou-mail.org/pgstream) that
happens to have a unified API for prepared/non-prepared statements.

-- 
 Daniel
 PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org

#include "pgstream.h"
#include 
#include 
#include 
#include 

int main(int argc, char** argv)
{
  const int loops=1;
  pg_cnx cnx;
  char buf[]="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
  try {
struct timeval tv1, tv2;
if (argc < 2) {
  std::cerr << "Usage: " << argv[0] << " connection_string\n";
  exit(1);
}
cnx.connect(argv[1]);

for (int j=0; j<2; j++) {
  cnx.set_option("bind_variables", j>0);
  cnx.set_option("prepare_statements", j>0);
  {
pg_trans trans(cnx);
pg_stmt("CREATE TABLE txt1(pk1 int,t1 text,t2 text,"
"t3 text, t4 text)", cnx);
gettimeofday(&tv1,NULL);
pg_stream s1("INSERT INTO txt1(pk1,t1,t2,t3,t4) "
 "VALUES(:i,:p1,:p2,:p3,:p4)", cnx);
for (int i=0; i
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [GENERAL] psql vs perl prepared inserts

2005-04-13 Thread Bruno Wolff III

On Wed, Apr 13, 2005 at 09:57:09 -0400,
  Matt Van Mater <[EMAIL PROTECTED]> wrote:
> 
> Also, I forgot to mention earlier that I tried using transactions to
> speed things up, but since I expect to see certain inserts fail I
> would need to rework my code so the whole transaction doesn't fail if
> one insert goes bad.  This is somewhat contrary to the purpose of
> transactions so I'm not sure how to accomplish this.  I saw roughly a
> 20% speed improvement by turning autocommit off and only committing at
> the end of parsing each file.

You might get a small benefit using savepoints in version 8.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] psql vs perl prepared inserts

2005-04-13 Thread Matt Van Mater

Thanks to all who replied.  Thanks for the tip on that last thread
Tom, I don't know how I missed it.  I have a hunch that it's not
applicable to me at this time because I'm running a year and a half
old software (included in OpenBSD 3.4), but I will have to check which
version of DBD::Pg was installed.

> The intent of prepared statements is to reduce the overhead of running
> the parser, rewriter and planner multiple times for a statement that is
> executed multiple times. For an INSERT query without any sub-selects
> that is not rewritten by any rules, the cost to parse, rewrite and plan
> the statement is trivial. So I wouldn't expect prepared statements to be
> a big win -- you would gain a lot more from batching multiple inserts
> into a single transaction, and more still from using COPY.

I was thinking something along the same lines, and was considering
using the COPY statement as my next step, but as someone mentioned
then I have to load it into a temporary database and then do some more
internal magic to preserve referential integrity, unique contraints,
etc.  For that reason I was hoping to keep it in perl, and it's always
nice to keep everything in a single neat portable package.

Also, I forgot to mention earlier that I tried using transactions to
speed things up, but since I expect to see certain inserts fail I
would need to rework my code so the whole transaction doesn't fail if
one insert goes bad.  This is somewhat contrary to the purpose of
transactions so I'm not sure how to accomplish this.  I saw roughly a
20% speed improvement by turning autocommit off and only committing at
the end of parsing each file.

I think in the end I need to check the version of my Pg driver and
perhaps upgrade to 7.4 or 8.0 in order to take advantage of the server
side prepared statements.  This is only a development box and I'm
doing this mostly as an academic exercise that will someday help me
speed up the production side, so upgrading isn't out of the question.

Matt

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [GENERAL] psql vs perl prepared inserts

2005-04-13 Thread Neil Conway

Dawid Kuroczko wrote:
For a test you might want to try also this approach (both from perl and
from psql):
$dbh->do('PREPARE sth_tim (int,inet,boolean,timestamptz) AS INSERT
INTO timestamps VALUES ($1,$2,$3,$4)');
$sth_tim  = $dbh->prepare("EXECUTE sth_tim(?,?,?,?)");
...and later execute it.  (and likewise with psql).  If you'll see gain in speed
with perl it means your DBD::Pg wasn't using server side prepared
statements.
The intent of prepared statements is to reduce the overhead of running 
the parser, rewriter and planner multiple times for a statement that is 
executed multiple times. For an INSERT query without any sub-selects 
that is not rewritten by any rules, the cost to parse, rewrite and plan 
the statement is trivial. So I wouldn't expect prepared statements to be 
a big win -- you would gain a lot more from batching multiple inserts 
into a single transaction, and more still from using COPY.

-Neil
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [GENERAL] psql vs perl prepared inserts

2005-04-13 Thread Sean Davis

On Apr 13, 2005, at 4:12 AM, Dawid Kuroczko wrote:
On 4/12/05, Matt Van Mater <[EMAIL PROTECTED]> wrote:
I've been experimenting with loading a large amount of data into a
fairly simple database using both psql and perl prepared statements.
Unfortunately I'm seeing no appreciable differences between the two
methods, where I was under the impression that prepared statements
should be much faster (in my case, they are slightly slower).
I've been playing with similar issue and in my case the best solution
for bulk insert was using perl to format data in form suitable for COPY
command.

I second this approach.  Generally, getting the data into the database 
can be done VERY quickly (for the 18k rows you have, it would likely be 
instantaneous to copy them).  I often create a separate "loader" schema 
into which I load text files.  Then, I can use SQL, triggers, or 
functions to "clean up" the data, enforce referential integrity, etc. 
within the database.  If you have perl code to do this, you can 
probably modify it just slightly to be used in a pl/perl function to do 
the same thing as before, but now it is done on the server side and 
will probably be significantly faster.

Sean
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq

Re: [GENERAL] psql vs perl prepared inserts

2005-04-13 Thread Dawid Kuroczko

On 4/12/05, Matt Van Mater <[EMAIL PROTECTED]> wrote:
> I've been experimenting with loading a large amount of data into a
> fairly simple database using both psql and perl prepared statements.
> Unfortunately I'm seeing no appreciable differences between the two
> methods, where I was under the impression that prepared statements
> should be much faster (in my case, they are slightly slower).

I've been playing with similar issue and in my case the best solution
for bulk insert was using perl to format data in form suitable for COPY
command.

I believe there may be problems if you have RULEs on table you want
to copy data into (IIRC such RULEs will be ignored).

For a test you might want to try also this approach (both from perl and
from psql):

$dbh->do('PREPARE sth_tim (int,inet,boolean,timestamptz) AS INSERT
INTO timestamps VALUES ($1,$2,$3,$4)');
$sth_tim  = $dbh->prepare("EXECUTE sth_tim(?,?,?,?)");

...and later execute it.  (and likewise with psql).  If you'll see gain in speed
with perl it means your DBD::Pg wasn't using server side prepared
statements.  (And there is a quite a chance that your psql will outperform
perl using this syntax).

   Regards,
  Dawid

PS: I have not tested these statements, yet they should be ok; make
sure the argument list (int,inet,boolean...) is correct for your data.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [GENERAL] psql vs perl prepared inserts

2005-04-12 Thread Tom Lane

Matt Van Mater <[EMAIL PROTECTED]> writes:
> I've been experimenting with loading a large amount of data into a
> fairly simple database using both psql and perl prepared statements. 
> Unfortunately I'm seeing no appreciable differences between the two
> methods, where I was under the impression that prepared statements
> should be much faster (in my case, they are slightly slower).

They should be faster ... if the client library is really using
server-side prepared statements, and not trying to fake it.  Up till
7.4 came out there wasn't any very useful way to use server-side
prepared statements in a driver, and I think only the very latest
version of DBD::Pg knows anything about it.  See this thread for
some recent discussion:
http://archives.postgresql.org/pgsql-interfaces/2005-04/msg00029.php

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

[GENERAL] psql vs perl prepared inserts

2005-04-12 Thread Matt Van Mater

I've been experimenting with loading a large amount of data into a
fairly simple database using both psql and perl prepared statements. 
Unfortunately I'm seeing no appreciable differences between the two
methods, where I was under the impression that prepared statements
should be much faster (in my case, they are slightly slower).

I have included a pseudocode/subset of the perl code I use below.  You
can see the prepare statement outside the loop and the execute
statement inside the loop.  Alternatively you can see that I write
every INSERT statement to a text file which I then load by using `psql
dbname -f bulksql_load.sql`.  Normally I only have either the prepare
or the print-to-file in the loop, but i've included both to make the
pseudocode smaller.

Using a simple `time` command from my system it looks like the
execution time for loading all the data in both scenarios is about 50
seconds.  FYI, the infile has 18k lines, and after parsing and the db
enforcing uniqueness there are 15.5k rows in the results table of the
db.  This works out to ~300 inserts per second with on pgsql 7.3.2
with fsync turned off.  I think that is a decent performance for this
old box, I'm just confused as to why the prepared statements don't
seem to give any speed boost as advertised.

Could the fact that many of my inserts have 15 columns slow down the
prepared statements to the point where they're no longer useful as a
speed enhancement?  Or perhaps it's because I'm explicitly inserting
each field/value pair, even if many of them are null (which I think is
generally considered a good habit).  Any other suggestions you might
have would be welcome.  Please forgive me if I should have posted this
to pgsql-performance or some perl list, but I think the comparison
with psql makes it relevant.


Hopefully this won't be too messy:

#!/usr/bin/perl

open (IN,"scan.nbe");
open (OUT,">bulksql_load.sql");


use DBI;
$dbh = DBI->connect("dbi:Pg:dbname=nessus");
$sth_res  = $dbh->prepare("INSERT INTO results
(scan_id,subnet,host,service,plugin_id,type,description,see_also,solution,risk_factor,cve_id,bid_id,port,proto,service_name)
VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)");
$sth_tim  = $dbh->prepare("INSERT INTO timestamps
(scan_id,hostip_or_nul,start_or_end,scan_time) VALUES (?,?,?,?)");

while (){

if (/^timestamps/){
parse_ts();


$sth_tim->execute($scan_id,$hostip_or_nul,$start_or_end,$scan_time);

print OUT "INSERT INTO timestamps
(scan_id,hostip_or_nul,start_or_end,scan_time) VALUES
($scan_id,$hostip_or_nul,$start_or_end,$scan_time)\n";

} elsif (/^results/) {
parse_res();


$sth_res->execute($scan_id,$subnet,$host,$service,$plugin_id,$type,$description,$see_also,$solution,$risk_factor,$cve_id,$bid_id,$port,$proto,$service_name);

print OUT "INSERT INTO results
(scan_id,subnet,host,service,plugin_id,type,description,see_also,solution,risk_factor,cve_id,bid_id,port,proto,service_name)
VALUES 
($scan_id,$subnet,$host,$service,$plugin_id,$type,$description,$see_also,$solution,$risk_factor,$cve_id,$bid_id,$port,$proto,$service_name)\n";

}
}

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] psql vs perl prepared inserts

Re: [GENERAL] psql vs perl prepared inserts

Re: [GENERAL] psql vs perl prepared inserts

Re: [GENERAL] psql vs perl prepared inserts

Re: [GENERAL] psql vs perl prepared inserts

Re: [GENERAL] psql vs perl prepared inserts

Re: [GENERAL] psql vs perl prepared inserts

Re: [GENERAL] psql vs perl prepared inserts

Re: [GENERAL] psql vs perl prepared inserts

Re: [GENERAL] psql vs perl prepared inserts

Re: [GENERAL] psql vs perl prepared inserts

Re: [GENERAL] psql vs perl prepared inserts

[GENERAL] psql vs perl prepared inserts

13 matches

Site Navigation

Mail list logo

Footer information