On Sat, Sep 21, 2024 at 8:07 PM Peter J. Holzer <hjp-pg...@hjp.at> wrote:
> On 2024-09-21 16:44:08 +0530, Lok P wrote: > > But wondering why we don't see any difference in performance between > method-2 > > and method-3 above. > > The code runs completely inside the database. So there isn't much > difference between a single statement which inserts 50 rows and 50 > statements which insert 1 row each. The work to be done is (almost) the > same. > > This changes once you consider an application which runs outside of the > database (maybe even on a different host). Such an application has to > wait for the result of each statement before it can send the next one. > Now it makes a difference whether you are waiting 50 times for a > statement which does very little or just once for a statement which does > more work. > > > So does it mean that,I am testing this in a wrong way or > > That depends on what you want to test. If you are interested in the > behaviour of stored procedures, the test is correct. If you want to know > about the performance of a database client (whether its written in Java, > Python, Go or whatever), this is the wrong test. You have to write the > test in your target language and run it on the client system to get > realistic results (for example, the round-trip times will be a lot > shorter if the client and database are on the same computer than when > one is in Europe and the other in America). > > For example, here are the three methods as Python scripts: > > > --------------------------------------------------------------------------------------------------- > #!/usr/bin/python3 > > import time > import psycopg2 > > num_inserts = 10_000 > > db = psycopg2.connect() > csr = db.cursor() > > csr.execute("drop table if exists parent_table") > csr.execute("create table parent_table (id int primary key, t text)") > > start_time = time.monotonic() > for i in range(1, num_inserts+1): > csr.execute("insert into parent_table values(%s, %s)", (i, 'a')) > db.commit() > end_time = time.monotonic() > elapsed_time = end_time - start_time > print(f"Method 1: Individual Inserts with Commit after every Row: > {elapsed_time:.3} seconds") > > # vim: tw=99 > > --------------------------------------------------------------------------------------------------- > #!/usr/bin/python3 > > import time > import psycopg2 > > num_inserts = 10_000 > batch_size = 50 > > db = psycopg2.connect() > csr = db.cursor() > > csr.execute("drop table if exists parent_table") > csr.execute("create table parent_table (id int primary key, t text)") > db.commit() > > start_time = time.monotonic() > for i in range(1, num_inserts+1): > csr.execute("insert into parent_table values(%s, %s)", (i, 'a')) > if i % batch_size == 0: > db.commit() > db.commit() > end_time = time.monotonic() > elapsed_time = end_time - start_time > print(f"Method 2: Individual Inserts with Commit after {batch_size} Rows: > {elapsed_time:.3} seconds") > > # vim: tw=99 > > --------------------------------------------------------------------------------------------------- > #!/usr/bin/python3 > > import itertools > import time > import psycopg2 > > num_inserts = 10_000 > batch_size = 50 > > db = psycopg2.connect() > csr = db.cursor() > > csr.execute("drop table if exists parent_table") > csr.execute("create table parent_table (id int primary key, t text)") > db.commit() > > start_time = time.monotonic() > batch = [] > for i in range(1, num_inserts+1): > batch.append((i, 'a')) > if i % batch_size == 0: > q = "insert into parent_table values" + ",".join(["(%s, %s)"] * > len(batch)) > params = list(itertools.chain.from_iterable(batch)) > csr.execute(q, params) > db.commit() > batch = [] > if batch: > q = "insert into parent_table values" + ",".join(["(%s, %s)"] * > len(batch)) > csr.execute(q, list(itertools.chain(batch))) > db.commit() > batch = [] > > end_time = time.monotonic() > elapsed_time = end_time - start_time > print(f"Method 3: Batch Inserts ({batch_size}) with Commit after each > batch: {elapsed_time:.3} seconds") > > # vim: tw=99 > > --------------------------------------------------------------------------------------------------- > > On my laptop, method2 is about twice as fast as method3. But if I > connect to a database on the other side of the city, method2 is now more > than 16 times faster than method3 . Simply because the delay in > communication is now large compared to the time it takes to insert those > rows. > > Thank you so much. I was expecting method-3(batch insert) to be the fastest or atleast as you said perform with similar speed as method-2 (row by row insert with batch commit) if we do it within the procedure inside the database. But because the context switching will be minimal in method-3 as it will prepare the insert and submit to the database in one shot in one DB call, so it should be a bit fast. But from your figures , it appears to be the opposite , i.e.method-2 is faster than method-3. Not able to understand the reason though. So in this case then ,it appears we can follow method-2 as that is cheaper in regards to less code change , i.e. just shifting the commit points without any changes for doing the batch insert. Btw,Do you have any thoughts, why method-2 is faster as compared to method-3 in your test?