Re: [GENERAL] psql vs perl prepared inserts
Just to add some clarity to the Dawid's fine comments: On 4/14/05, Dawid Kuroczko <[EMAIL PROTECTED]> wrote: > On 4/13/05, Matt Van Mater <[EMAIL PROTECTED]> wrote: > I believe that COPY handles things like unique constraints and > referer integerity just fine (of the latter I am not sure) -- well, > its hard to imagine PostgreSQL allowing itself to loose integrity > of the data, don't you agree? > I do, and so does PG. :) Everything that would normally happen with an INSERT, including firing TRIGGERs, checking FKEYs, setting DEFAULTs, running CHECK constraints -- everything -- happens with a COPY import. The *one and only* exception to the above statement is that RULEs to not get used on COPY. The reason for this is that RULEs rewrite queries and after the COPY FROM header there isn't a query to rewrite, it's just a pile of data. This is in the docs, but it's not easy to find. It's actually just one line in the "Notes" section of http://www.postgresql.org/docs/8.0/static/sql-copy.html . Perhaps we should add some more verbiage (he says, non-voluntarily...)? -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] psql vs perl prepared inserts
On 4/13/05, Matt Van Mater <[EMAIL PROTECTED]> wrote: > > The intent of prepared statements is to reduce the overhead of running > > the parser, rewriter and planner multiple times for a statement that is > > executed multiple times. For an INSERT query without any sub-selects > > that is not rewritten by any rules, the cost to parse, rewrite and plan > > the statement is trivial. So I wouldn't expect prepared statements to be > > a big win -- you would gain a lot more from batching multiple inserts > > into a single transaction, and more still from using COPY. > I was thinking something along the same lines, and was considering > using the COPY statement as my next step, but as someone mentioned > then I have to load it into a temporary database and then do some more > internal magic to preserve referential integrity, unique contraints, > etc. For that reason I was hoping to keep it in perl, and it's always > nice to keep everything in a single neat portable package. I believe that COPY handles things like unique constraints and referer integerity just fine (of the latter I am not sure) -- well, its hard to imagine PostgreSQL allowing itself to loose integrity of the data, don't you agree? A simple test: qnex=# CREATE TABLE a (a int PRIMARY KEY); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "a_pkey" for table "a" CREATE TABLEqnex=# COPY a FROM STDIN; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> 1 >> 2 >> 3 >> 1 >> \. ERROR: duplicate key violates unique constraint "a_pkey" CONTEXT: COPY a, line 4: "1" ...you may have to rethink your design if there are any RULEs or TRIGGERs associated with said table. Apart from that, there should be no problems at the risk that one "bad" row will cancel whole COPY commands, so if you trust your data, it will be fine. :) Regards, Dawid ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] psql vs perl prepared inserts
On Wed, Apr 13, 2005 at 09:57:09AM -0400, Matt Van Mater wrote: > Also, I forgot to mention earlier that I tried using transactions to > speed things up, but since I expect to see certain inserts fail I > would need to rework my code so the whole transaction doesn't fail if > one insert goes bad. This is somewhat contrary to the purpose of > transactions so I'm not sure how to accomplish this. Try the pgloader project at pgfoundry. It tries to insert all rows using COPY and the rejected ones go to a file, using something akin to binary search. It may be of some use. I haven't tried it so I'm not sure of its maturity. Let us know how it goes if you try it! -- Alvaro Herrera (<[EMAIL PROTECTED]>) "And as an added bonus, now my computer goes to the toilet for me, leaving me free to spend time on more useful activities! yay slug codefests!" (C. Parker) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] psql vs perl prepared inserts
On Apr 13, 2005, at 9:57 AM, Matt Van Mater wrote: Thanks to all who replied. Thanks for the tip on that last thread Tom, I don't know how I missed it. I have a hunch that it's not applicable to me at this time because I'm running a year and a half old software (included in OpenBSD 3.4), but I will have to check which version of DBD::Pg was installed. The intent of prepared statements is to reduce the overhead of running the parser, rewriter and planner multiple times for a statement that is executed multiple times. For an INSERT query without any sub-selects that is not rewritten by any rules, the cost to parse, rewrite and plan the statement is trivial. So I wouldn't expect prepared statements to be a big win -- you would gain a lot more from batching multiple inserts into a single transaction, and more still from using COPY. I was thinking something along the same lines, and was considering using the COPY statement as my next step, but as someone mentioned then I have to load it into a temporary database and then do some more internal magic to preserve referential integrity, unique contraints, etc. For that reason I was hoping to keep it in perl, and it's always nice to keep everything in a single neat portable package. You can use pl/perl stored functions to do exactly what you do in perl on the client side, but it happens much faster since the data is already on the server. Then, your neat little portable package IS the database. Sean ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] psql vs perl prepared inserts
Neil Conway <[EMAIL PROTECTED]> writes: > The intent of prepared statements is to reduce the overhead of running > the parser, rewriter and planner multiple times for a statement that is > executed multiple times. For an INSERT query without any sub-selects > that is not rewritten by any rules, the cost to parse, rewrite and plan > the statement is trivial. You'd be surprised ... I was looking into this just the other day, with a test case that looks like create table bench1 (id int NOT NULL,id2 int NOT NULL,id3 int NOT NULL, dummy1 char(30)); create unique index bench1_index_ on bench1 (id,id2); create index bench1_index_1 on bench1 (id3); begin; insert into bench1 values (0,0,0,'ABCDEFGHIJ'); ... 300,000 inserts ... insert into bench1 values (167151,167151,167151,'CDEFGHIJKL'); commit; According to gprof, the above-1% functions are % cumulative self self total time seconds secondscalls s/call s/call name 6.62 53.3853.38 37 0.00 0.00 yyparse 5.31 96.1942.81 11808781 0.00 0.00 hash_search 4.44132.0035.81 61268959 0.00 0.00 AllocSetAlloc 4.23166.1334.13 902304 0.00 0.00 XLogInsert 3.20191.9925.86 13688735 0.00 0.00 _bt_compare 2.94215.7523.76 12158347 0.00 0.00 LWLockAcquire 2.67237.3221.58 4500066 0.00 0.00 base_yylex 2.56258.0020.68 6000510 0.00 0.00 SearchCatCache 1.99274.0716.07 12160856 0.00 0.00 LWLockRelease 1.88289.2515.18 13008925 0.00 0.00 hash_any 1.49301.2512.01 2452386 0.00 0.00 PinBuffer 1.36312.2511.00 1201324 0.00 0.00 fmgr_info_cxt_security 1.36323.2410.98 30 0.00 0.00 planner 1.19332.81 9.57 20700142 0.00 0.00 MemoryContextAllocZeroAligned I don't trust gprof's tree-structured breakdown entirely, but it puts a pretty significant fraction of the blame on parse/plan activities: 3.66 767.69 37/37 PostgresMain [4] [5] 95.63.66 767.69 37 exec_simple_query [5] 6.13 283.12 37/37 PortalRun [6] 0.48 167.39 37/37 pg_analyze_and_rewrite [9] 0.47 122.85 37/37 pg_plan_queries [16] 1.56 93.29 37/37 pg_parse_query [23] 0.62 34.78 37/37 pg_rewrite_queries [52] 0.99 17.39 37/37 PortalDrop [79] 0.56 16.26 600014/600014 finish_xact_command [84] 1.196.89 37/37 CreatePortal [126] 1.082.29 600014/600014 start_xact_command [186] 1.880.36 37/37 PortalStart [218] 0.761.44 37/37 pq_puttextmessage [220] 1.350.00 37/600017 set_ps_display [210] 1.270.00 37/37 CreateCommandTag [271] 0.890.22 37/37 printtup_create_DR [286] 0.740.00 37/37 CreateDestReceiver [328] 0.610.00 37/600015 pgstat_report_activity [277] 0.100.14 37/13864259 pfree [112] 0.230.00 37/37 PortalSetResultFormat [449] 0.190.00 37/37 IsAbortedTransactionBlockState [474] 0.070.00 37/37 printtup_destroy [564] 0.050.00 37/37 PortalDefineQuery [580] 0.030.00 37/37 EndCommand [617] 0.010.00 37/37 BeginCommand [670] That adds up to over 50% of the runtime spent in parse/rewrite/plan. I haven't gotten around to converting the test case into a program that can use a prepared INSERT command, but it looks plausible to expect a factor of 2 or so speedup ... of course, using COPY would completely blow this away, anyway ... regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] psql vs perl prepared inserts
Neil Conway wrote: > For an INSERT query without any sub-selects > that is not rewritten by any rules, the cost to parse, rewrite and plan > the statement is trivial. So I wouldn't expect prepared statements to be > a big win -- you would gain a lot more from batching multiple inserts > into a single transaction, and more still from using COPY. FWIW, when testing pgstream [1] I typically see a 50% increase in execution speed when switching to prepared statements in such a scenario. I'm attaching a small test program that inserts 1 rows into 5 columns, first without and then with prepared statements, and displays elapsed time. Example of results: elapsed time in loop 0 is 1873 ms (PQexec) elapsed time in loop 1 is 1136 ms (PQexecPrepared) That's with unix domain sockets and a 8.0.1 server. [1] a thin C++ layer on top of libpq (http://manitou-mail.org/pgstream) that happens to have a unified API for prepared/non-prepared statements. -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org #include "pgstream.h" #include #include #include #include int main(int argc, char** argv) { const int loops=1; pg_cnx cnx; char buf[]="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"; try { struct timeval tv1, tv2; if (argc < 2) { std::cerr << "Usage: " << argv[0] << " connection_string\n"; exit(1); } cnx.connect(argv[1]); for (int j=0; j<2; j++) { cnx.set_option("bind_variables", j>0); cnx.set_option("prepare_statements", j>0); { pg_trans trans(cnx); pg_stmt("CREATE TABLE txt1(pk1 int,t1 text,t2 text," "t3 text, t4 text)", cnx); gettimeofday(&tv1,NULL); pg_stream s1("INSERT INTO txt1(pk1,t1,t2,t3,t4) " "VALUES(:i,:p1,:p2,:p3,:p4)", cnx); for (int i=0; i ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] psql vs perl prepared inserts
On Wed, Apr 13, 2005 at 09:57:09 -0400, Matt Van Mater <[EMAIL PROTECTED]> wrote: > > Also, I forgot to mention earlier that I tried using transactions to > speed things up, but since I expect to see certain inserts fail I > would need to rework my code so the whole transaction doesn't fail if > one insert goes bad. This is somewhat contrary to the purpose of > transactions so I'm not sure how to accomplish this. I saw roughly a > 20% speed improvement by turning autocommit off and only committing at > the end of parsing each file. You might get a small benefit using savepoints in version 8. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] psql vs perl prepared inserts
Thanks to all who replied. Thanks for the tip on that last thread Tom, I don't know how I missed it. I have a hunch that it's not applicable to me at this time because I'm running a year and a half old software (included in OpenBSD 3.4), but I will have to check which version of DBD::Pg was installed. > The intent of prepared statements is to reduce the overhead of running > the parser, rewriter and planner multiple times for a statement that is > executed multiple times. For an INSERT query without any sub-selects > that is not rewritten by any rules, the cost to parse, rewrite and plan > the statement is trivial. So I wouldn't expect prepared statements to be > a big win -- you would gain a lot more from batching multiple inserts > into a single transaction, and more still from using COPY. I was thinking something along the same lines, and was considering using the COPY statement as my next step, but as someone mentioned then I have to load it into a temporary database and then do some more internal magic to preserve referential integrity, unique contraints, etc. For that reason I was hoping to keep it in perl, and it's always nice to keep everything in a single neat portable package. Also, I forgot to mention earlier that I tried using transactions to speed things up, but since I expect to see certain inserts fail I would need to rework my code so the whole transaction doesn't fail if one insert goes bad. This is somewhat contrary to the purpose of transactions so I'm not sure how to accomplish this. I saw roughly a 20% speed improvement by turning autocommit off and only committing at the end of parsing each file. I think in the end I need to check the version of my Pg driver and perhaps upgrade to 7.4 or 8.0 in order to take advantage of the server side prepared statements. This is only a development box and I'm doing this mostly as an academic exercise that will someday help me speed up the production side, so upgrading isn't out of the question. Matt ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] psql vs perl prepared inserts
Dawid Kuroczko wrote: For a test you might want to try also this approach (both from perl and from psql): $dbh->do('PREPARE sth_tim (int,inet,boolean,timestamptz) AS INSERT INTO timestamps VALUES ($1,$2,$3,$4)'); $sth_tim = $dbh->prepare("EXECUTE sth_tim(?,?,?,?)"); ...and later execute it. (and likewise with psql). If you'll see gain in speed with perl it means your DBD::Pg wasn't using server side prepared statements. The intent of prepared statements is to reduce the overhead of running the parser, rewriter and planner multiple times for a statement that is executed multiple times. For an INSERT query without any sub-selects that is not rewritten by any rules, the cost to parse, rewrite and plan the statement is trivial. So I wouldn't expect prepared statements to be a big win -- you would gain a lot more from batching multiple inserts into a single transaction, and more still from using COPY. -Neil ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] psql vs perl prepared inserts
On Apr 13, 2005, at 4:12 AM, Dawid Kuroczko wrote: On 4/12/05, Matt Van Mater <[EMAIL PROTECTED]> wrote: I've been experimenting with loading a large amount of data into a fairly simple database using both psql and perl prepared statements. Unfortunately I'm seeing no appreciable differences between the two methods, where I was under the impression that prepared statements should be much faster (in my case, they are slightly slower). I've been playing with similar issue and in my case the best solution for bulk insert was using perl to format data in form suitable for COPY command. I second this approach. Generally, getting the data into the database can be done VERY quickly (for the 18k rows you have, it would likely be instantaneous to copy them). I often create a separate "loader" schema into which I load text files. Then, I can use SQL, triggers, or functions to "clean up" the data, enforce referential integrity, etc. within the database. If you have perl code to do this, you can probably modify it just slightly to be used in a pl/perl function to do the same thing as before, but now it is done on the server side and will probably be significantly faster. Sean ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] psql vs perl prepared inserts
On 4/12/05, Matt Van Mater <[EMAIL PROTECTED]> wrote: > I've been experimenting with loading a large amount of data into a > fairly simple database using both psql and perl prepared statements. > Unfortunately I'm seeing no appreciable differences between the two > methods, where I was under the impression that prepared statements > should be much faster (in my case, they are slightly slower). I've been playing with similar issue and in my case the best solution for bulk insert was using perl to format data in form suitable for COPY command. I believe there may be problems if you have RULEs on table you want to copy data into (IIRC such RULEs will be ignored). For a test you might want to try also this approach (both from perl and from psql): $dbh->do('PREPARE sth_tim (int,inet,boolean,timestamptz) AS INSERT INTO timestamps VALUES ($1,$2,$3,$4)'); $sth_tim = $dbh->prepare("EXECUTE sth_tim(?,?,?,?)"); ...and later execute it. (and likewise with psql). If you'll see gain in speed with perl it means your DBD::Pg wasn't using server side prepared statements. (And there is a quite a chance that your psql will outperform perl using this syntax). Regards, Dawid PS: I have not tested these statements, yet they should be ok; make sure the argument list (int,inet,boolean...) is correct for your data. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [GENERAL] psql vs perl prepared inserts
Matt Van Mater <[EMAIL PROTECTED]> writes: > I've been experimenting with loading a large amount of data into a > fairly simple database using both psql and perl prepared statements. > Unfortunately I'm seeing no appreciable differences between the two > methods, where I was under the impression that prepared statements > should be much faster (in my case, they are slightly slower). They should be faster ... if the client library is really using server-side prepared statements, and not trying to fake it. Up till 7.4 came out there wasn't any very useful way to use server-side prepared statements in a driver, and I think only the very latest version of DBD::Pg knows anything about it. See this thread for some recent discussion: http://archives.postgresql.org/pgsql-interfaces/2005-04/msg00029.php regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] psql vs perl prepared inserts
I've been experimenting with loading a large amount of data into a fairly simple database using both psql and perl prepared statements. Unfortunately I'm seeing no appreciable differences between the two methods, where I was under the impression that prepared statements should be much faster (in my case, they are slightly slower). I have included a pseudocode/subset of the perl code I use below. You can see the prepare statement outside the loop and the execute statement inside the loop. Alternatively you can see that I write every INSERT statement to a text file which I then load by using `psql dbname -f bulksql_load.sql`. Normally I only have either the prepare or the print-to-file in the loop, but i've included both to make the pseudocode smaller. Using a simple `time` command from my system it looks like the execution time for loading all the data in both scenarios is about 50 seconds. FYI, the infile has 18k lines, and after parsing and the db enforcing uniqueness there are 15.5k rows in the results table of the db. This works out to ~300 inserts per second with on pgsql 7.3.2 with fsync turned off. I think that is a decent performance for this old box, I'm just confused as to why the prepared statements don't seem to give any speed boost as advertised. Could the fact that many of my inserts have 15 columns slow down the prepared statements to the point where they're no longer useful as a speed enhancement? Or perhaps it's because I'm explicitly inserting each field/value pair, even if many of them are null (which I think is generally considered a good habit). Any other suggestions you might have would be welcome. Please forgive me if I should have posted this to pgsql-performance or some perl list, but I think the comparison with psql makes it relevant. Hopefully this won't be too messy: #!/usr/bin/perl open (IN,"scan.nbe"); open (OUT,">bulksql_load.sql"); use DBI; $dbh = DBI->connect("dbi:Pg:dbname=nessus"); $sth_res = $dbh->prepare("INSERT INTO results (scan_id,subnet,host,service,plugin_id,type,description,see_also,solution,risk_factor,cve_id,bid_id,port,proto,service_name) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"); $sth_tim = $dbh->prepare("INSERT INTO timestamps (scan_id,hostip_or_nul,start_or_end,scan_time) VALUES (?,?,?,?)"); while (){ if (/^timestamps/){ parse_ts(); $sth_tim->execute($scan_id,$hostip_or_nul,$start_or_end,$scan_time); print OUT "INSERT INTO timestamps (scan_id,hostip_or_nul,start_or_end,scan_time) VALUES ($scan_id,$hostip_or_nul,$start_or_end,$scan_time)\n"; } elsif (/^results/) { parse_res(); $sth_res->execute($scan_id,$subnet,$host,$service,$plugin_id,$type,$description,$see_also,$solution,$risk_factor,$cve_id,$bid_id,$port,$proto,$service_name); print OUT "INSERT INTO results (scan_id,subnet,host,service,plugin_id,type,description,see_also,solution,risk_factor,cve_id,bid_id,port,proto,service_name) VALUES ($scan_id,$subnet,$host,$service,$plugin_id,$type,$description,$see_also,$solution,$risk_factor,$cve_id,$bid_id,$port,$proto,$service_name)\n"; } } ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly