Re: Why are commits consuming most of the database time?

2023-04-04 Thread Tim Slechta
Tom,

Thank you for your comments, they are very much appreciated.

You are correct that the transactions are typically short, likely with
dozens of rows.

Do you know of any problems or defects in this area?

Would there be any usefulness to generating Postgres log files?

Once again, thanks for your help.

-Tim

On Tue, Apr 4, 2023 at 10:55 AM Tim Slechta  wrote:

>
>
> On Tue, Apr 4, 2023 at 9:57 AM Tom Lane  wrote:
>
>> Tim Slechta  writes:
>> > The test data below is from a non-virtualized (client system and
>> database
>> > server) Postgres 14 environment, with no replication, no high
>> availability,
>> > and with no load balancing. This environment has older and slower disk
>> > drives, and the test is driven by a single client process.
>>
>> > In this case 24% of the round trips (client to database and back) are
>> for
>> > commit processing.  However, commit processing is consuming 89% of the
>> > total database time. (All times are measured from within the client.)
>>
>> You didn't say how big the transactions are, but if they're not writing
>> a lot of data apiece, this result seems totally non-surprising.  The
>> commits have to push WAL log data down to disk before they can promise
>> that the transaction's results are durable, while the statements within
>> the transactions probably are not waiting for any disk writes at all.
>>
>> If you don't need strict ACID compliance, you could turn off
>> synchronous_commit so that commits don't wait for WAL flush.
>> (This doesn't risk the consistency of your database, but it does
>> mean that a crash might lose the last few transactions that clients
>> were told got committed.)
>>
>> If you do need strict ACID compliance, get a better disk subsystem.
>> Or, perhaps, just a better OS ... Windows is generally not thought of
>> as the best-performing platform for Postgres.
>>
>> regards, tom lane
>>
>


Why are commits consuming most of the database time?

2023-04-04 Thread Tim Slechta
The test data below is from a non-virtualized (client system and database
server) Postgres 14 environment, with no replication, no high availability,
and with no load balancing. This environment has older and slower disk
drives, and the test is driven by a single client process.


In this case 24% of the round trips (client to database and back) are for
commit processing.  However, commit processing is consuming 89% of the
total database time. (All times are measured from within the client.)


In this non-virtualized environment, on the exact same hardware, other
RBMSs have a much lower commit-time/total-database-time ratio.

In a virtualized environment (both client system and database server) are
running in separate VMs with faster disks and with possibly many other
active VMs this number drops to about 70% for Postgres.


We see similar results in Linux environments as well.


*What is a good approach to identifying what is happening within the commit
processing?*


*Are there any known bugs in this area?*


Any other thoughts would be greatly appreciated.

Thank you.


-Tim


LineFreq   Cum.tMax.t   Avg.t   Rows  Err. Statement

1   2268   *301.908*  0.243   0.133   2235  0COMMIT

2   7559.6650.102   0.013   2326  0INSERT INTO
POMQUERY_U (  col0 ) VALUES (:1)

3   2660.1950.103   0.001   263   0SELECT t_01.puid
FROM PITEM t_01 WHERE ( UPPER ( t_01.pitem_id ) =  UPPER( :1 )  )

4   2440.1860.002   0.001   260   0INSERT INTO
POM_TIMESTAMP (puid, ptimestamp, pdbtimestamp, pdeleted) (SELECT :1, :2,
now() ...

[...snip...]

Sum: 9264   *338.200*  -   -   12050 -

Percent Commit 24%*89%*


My latest run was similar, in that its total database time was  14876.691
seconds with total commit time of 13032.575 seconds, or 88% commit time.


Postgres Version: PostgreSQL 14.5, compiled by Visual C++ build 1914, 64-bit
OS Name:  Microsoft Windows Server 2019 Standard
OS Version: 10.0.17763 N/A Build 17763


time sorted UUIDs

2022-12-15 Thread Tim Jones
Hi, 

could someone please comment on this article 
https://vladmihalcea.com/uuid-database-primary-key/ specifically re the 
comments (copied below) in regards to a Postgres database. 

... 


But, using a random UUID as a database table Primary Key is a bad idea for 
multiple reasons. 

First, the UUID is huge. Every single record will need 16 bytes for the 
database identifier, and this impacts all associated Foreign Key columns as 
well. 

Second, the Primary Key column usually has an associated B+Tree index to speed 
up lookups or joins, and B+Tree indexes store data in sorted order. 

However, indexing random values using B+Tree causes a lot of problems: 

* Index pages will have a very low fill factor because the values come 
randomly. So, a page of 8kB will end up storing just a few elements, therefore 
wasting a lot of space, both on the disk and in the database memory, as index 
pages could be cached in the Buffer Pool. 
* Because the B+Tree index needs to rebalance itself in order to maintain 
its equidistant tree structure, the random key values will cause more index 
page splits and merges as there is no pre-determined order of filling the tree 
structure. 
... 


Any other general comments about time sorted UUIDs would be welcome. 



Thanks, 

Tim Jones 




Re: Views don't seem to use indexes?

2021-10-28 Thread Tim Slechta
Tom, David,

Thank you for the time and information.

I lost my system this morning, so I need to re-establish a system and do
some additional homework.

Thanks again.

-Tim

BTW:  here is the definition of the pworkspaceobject table.

tc=# \d+ pworkspaceobject

 Table "public.pworkspaceobject"
   Column|Type | Collation |
Nullable | Default | Storage  | Stats target | Description
-+-+---+--+-+--+--+-
 puid| character varying(15)   |   |
not null |   | extended |  |
 pobject_name| character varying(128)  |   |
not null |   | extended |  |
 pobject_desc| character varying(240)  |   |
   |   | extended |  |
 pobject_type| character varying(32)   |   |
not null |   | extended |  |
 pobject_application | character varying(32)   |   |
not null |   | extended |  |
 vla_764_7   | integer |   |
not null | 0 | plain|  |
 pip_classification  | character varying(128)  |   |
   |   | extended |  |
 vla_764_10  | integer |   |
not null | 0 | plain|  |
 pgov_classification | character varying(128)  |   |
   |   | extended |  |
 vla_764_12  | integer |   |
not null | 0 | plain|  |
 pfnd0revisionid | character varying(32)   |   |
   |   | extended |  |
 vla_764_18  | integer |   |
not null | 0 | plain|  |
 vla_764_20  | integer |   |
not null | 0 | plain|  |
 rwso_threadu| character varying(15)   |   |
   |   | extended |  |
 rwso_threadc| integer |   |
   |   | plain|  |
 prevision_limit | integer |   |
not null |   | plain|  |
 prevision_number| integer |   |
not null |   | plain|  |
 rowning_organizationu   | character varying(15)   |   |
   |   | extended |  |
 rowning_organizationc   | integer |   |
   |   | plain|  |
 pactive_seq | integer |   |
   |   | plain|  |
 rowning_projectu| character varying(15)   |   |
   |   | extended |  |
 rowning_projectc| integer |   |
   |   | plain|  |
 pfnd0maturity   | integer |   |
   |   | plain|  |
 pdate_released  | timestamp without time zone |   |
   |   | plain|  |
 pfnd0isrevisiondiscontinued | smallint|   |
   |   | plain|  |
 pfnd0inprocess  | smallint|   |
   |   | plain|  |
 aoid| character varying(15)   |   |
not null | NULL::character varying | extended |  |
 arev_category   | integer |   |
not null | 48 | plain|  |
 aspace_uid  | character varying(15)   |   |
   | NULL::character varying | extended |  |
 avalid_from | timestamp without time zone |   |
not null | to_timestamp('1900/01/02 00:00:00'::text, '/MM/DD
HH24:MI:SS'::text)::timestamp without time zone | plain|  |
 avalid_to   | timestamp without time zone |   |
   |   | plain|  |
 vla_764_26  | integer |   |
not null | 0 | plain|  |
 pawp0issuspect  | smallint|   |
   |   | plain|  |
 vla_764_24  | integer |   |
not null | 0 | plain|  |
 vla_764_23  | integer |   |
not null | 0 | plain|  |
Indexes:
"pipworkspaceobject" PRIMARY KEY, btree (puid)
"pipworkspaceobject_0" btree (aoid)
"pipworkspaceobject_1" btree (upper(pobject_type::text))
"pipworkspaceobject_2" btree (upper(pobject_name::text))
"pipworkspaceob

Views don't seem to use indexes?

2021-10-27 Thread Tim Slechta
Why does the planner not use an index when a view is involved?

1) A description of what you are trying to achieve and what results you
expect.
Why don't plans use indexes when views are involved?  A similar query on
the underlying table leverages the appropriate index.

== Point 1. The following query leverages the pipl10n_object_name_1 index.
tc=# EXPLAIN ANALYZE select substr(pval_0, 49, 128) from pl10n_object_name
where substr(pval_0, 49, 128) = '';
  QUERY PLAN
--
 Bitmap Heap Scan on pl10n_object_name  (cost=4.48..32.15 rows=7 width=32)
(actual time=0.040..0.040 rows=0 loops=1)
   Recheck Cond: (substr((pval_0)::text, 49, 128) = ''::text)
   ->  *Bitmap Index Scan on pipl10n_object_name_1*  (cost=0.00..4.48
rows=7 width=0) (actual time=0.039..*0.039* rows=0 loops=1)
 Index Cond: (substr((pval_0)::text, 49, 128) = ''::text)
 Planning Time: 0.153 ms
 Execution Time: 0.056 ms
(6 rows)

== Point 2. The equivalent query on the VL10N_OBJECT_NAME view executes a
Seq Scan on the underlying pl10n_object_name. Why?
tc=# EXPLAIN ANALYZE select pval_0 from VL10N_OBJECT_NAME where pval_0 =
'';
  QUERY PLAN
--
 Subquery Scan on vl10n_object_name  (cost=0.00..323818.92 rows=5228
width=32) (actual time=2851.799..2851.801 rows=0 loops=1)
   Filter: (vl10n_object_name.pval_0 = ''::text)
   Rows Removed by Filter: 1043308
   ->  Append  (cost=0.00..310749.58 rows=1045547 width=208) (actual
time=0.046..2777.167 rows=1043308 loops=1)
 ->  *Seq Scan on pl10n_object_name*  (cost=0.00..252460.06
rows=870536 width=175) (actual time=0.046..*2389.282* rows=870645 loops=1)
 ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..44356.42
rows=175011 width=175) (actual time=0.019..313.357 rows=172663 loops=1)
   ->  Seq Scan on pworkspaceobject  (cost=0.00..42168.79
rows=175011 width=134) (actual time=0.016..291.661 rows=172663 loops=1)
 Filter: ((pobject_name IS NOT NULL) AND (vla_764_24 =
0))
 Rows Removed by Filter: 870629
 Planning Time: 0.204 ms
 Execution Time: 2851.830 ms
(11 rows)

== Additional Information ==
== View definition:
tc=# \d+ VL10N_OBJECT_NAME
View "public.vl10n_object_name"
   Column| Type  | Collation | Nullable | Default |
Storage  | Description
-+---+---+--+-+--+-
 puid| character varying(15) |   |  | |
extended |
 locale  | text  |   |  | |
extended |
 preference  | text  |   |  | |
extended |
 status  | text  |   |  | |
extended |
 sequence_no | numeric   |   |  | |
main |
 pval_0  | text  |   |  | |
extended |
View definition:
 SELECT pl10n_object_name.puid,
substr(pl10n_object_name.pval_0::text, 1, 5) AS locale,
substr(pl10n_object_name.pval_0::text, 7, 1) AS preference,
substr(pl10n_object_name.pval_0::text, 9, 1) AS status,
tc_to_number(substr(pl10n_object_name.pval_0::text, 11, 4)::character
varying) AS sequence_no,
substr(pl10n_object_name.pval_0::text, 49, 128) AS pval_0
   FROM pl10n_object_name
UNION ALL
 SELECT pworkspaceobject.puid,
'NONE'::text AS locale,
'M'::text AS preference,
'M'::text AS status,
0 AS sequence_no,
pworkspaceobject.pobject_name AS pval_0
   FROM pworkspaceobject
  WHERE pworkspaceobject.pobject_name IS NOT NULL AND
pworkspaceobject.vla_764_24 = 0;

== Table definition:
tc=# \d+ pl10n_object_name
 Table "public.pl10n_object_name"
 Column |  Type  | Collation | Nullable | Default | Storage
 | Stats target | Description
++---+--+-+--+--+-
 puid   | character varying(15)  |   | not null | |
extended |  |
 pseq   | integer|   | not null | | plain
 |  |
 pval_0 | character varying(176) |   |  | |
extended |  |
Indexes:
"pipl10n_object_name" PRIMARY KEY, btree (puid, pseq) DEFERRABLE
INITIALLY DEFERRED
"pipl10n_object_name_0" btree (pval_0)
"pipl10n_object_name_1" btree (substr(pval_0::text, 49, 128))
"pipl10n_object_name_2" btree (upper(substr(pval_0::text, 49, 128)))
"pipl10n_object_name_3" btree (substr(pval_0::text, 1, 5))
"pipl10n_object_na

Re: Better, consistent instrumentation for postgreSQL using a similar API as Oracle

2021-10-05 Thread Tim
Jeff Holt is probably pretty embarrassed there's some blowhard making a
scene using his name in a casual mailing list thread.

On Tue, Oct 5, 2021 at 5:28 PM Mladen Gogala 
wrote:

> Comments in-line
>
> On 10/5/21 16:24, Peter Geoghegan wrote:
> > On Fri, Oct 1, 2021 at 1:06 PM Jeff Holt  wrote:
> >> Now looking closely at postgreSQL, I see an opportunity to more quickly
> implement Oracle's current feature list.
> >>
> >> I've come to this point because I see many roadblocks for users who
> want to see a detailed "receipt" for their response time.
> > I have heard of method R. Offhand it seems roughly comparable to
> > something like the Top-down Microarchitecture Analysis Method that low
> > level systems programmers sometimes use, along with Intel's pmu-tools
> > -- at least at a very high level. The point seems to be to provide a
> > workflow that can plausibly zero in on low-level bottlenecks, by
> > providing high level context. Many tricky real world problems are in
> > some sense a high level problem that is disguised as a low level
> > problem. And so all of the pieces need to be present on the board, so
> > to speak.
> >
> > Does that sound accurate?
> Yes, that is pretty accurate. It is essentially the same method
> described in the "High Performance Computing" books. The trick is to
> figure what the process is waiting for and then reduce the wait times.
> All computers wait at the same speed.
> > One obvious issue with much of the Postgres instrumentation is that it
> > makes it hard to see how things change over time. I think that that is
> > often *way* more informative than static snapshots.
> >
> > I can see why you'd emphasize the need for PostgreSQL to more or less
> > own the end to end experience for something like this. It doesn't
> > necessarily follow that the underlying implementation cannot make use
> > of infrastructure like eBPF, though. Fast user space probes provably
> > have no overhead, and can be compiled-in by distros that can support
> > it. There hasn't been a consistent effort to make that stuff
> > available, but I doubt that that tells us much about what is possible.
> > The probes that we have today are somewhat of a grab-bag, that aren't
> > particularly useful -- so it's a chicken-and-egg thing.
>
> Not exactly. There already is a very good extension for Postgres called
> pg_wait_sampling:
>
> https://github.com/postgrespro/pg_wait_sampling
>
> What is missing here is mostly the documentation. This extension should
> become a part of Postgres proper and the events should be documented as
> they are (mostly) documented for Oracle. Oracle uses trace files
> instead. However, with Postgres equivalence of files and tables, this is
> not a big difference.
>
>
> >
> > It would probably be helpful if you could describe what you feel is
> > missing in more general terms -- while perhaps giving specific
> > practical examples of specific scenarios that give us some sense of
> > what the strengths of the model are. ISTM that it's not so much a lack
> > of automation in PostgreSQL. It's more like a lack of a generalized
> > model, which includes automation, but also some high level top-down
> > theory.
>
> I am not Jeff and my opinion is not as valuable and doesn't carry the
> same weight, by far. However, I do believe that we may not see Jeff Holt
> again on this group so I am providing my opinion instead. At least I
> would, in Jeff's place, be reluctant to return to this group.
>
>
> --
> Mladen Gogala
> Database Consultant
> Tel: (347) 321-1217
> https://dbwhisperer.wordpress.com
>
>
>
>


Re: Performance for initial copy when using pg_logical to upgrade Postgres

2021-09-24 Thread Tim
I'm currently doing this with a 2.2 TB database.

Best way IMO is to (UPDATE pg_index SET indisready = false ... ) for non PK
indexes for the largest tables. Then just set it back to indisready = true
after its done and run a REINDEX CONCURRENTLY on the indexes that were
disabled.

Got about a transfer speed of 100GB per 50 minutes with this method with
consistent results.

On Fri, Sep 24, 2021 at 11:49 AM Westwood, Giles 
wrote:

>
>
>
>
> On Fri, Sep 24, 2021 at 3:39 PM Justin Pryzby 
> wrote:
>
>> On Fri, Sep 24, 2021 at 03:28:50PM +0100, Westwood, Giles wrote:
>>
>> Did you see this thread and its suggestions to 1) set bulk load
>> parameters;
>> and, 2) drop indexes and FKs ?
>>
>>
>> https://www.postgresql.org/message-id/flat/4a8efc4e-a264-457d-a8e7-ae324ed9a...@thebuild.com
>>
>>
> I did actually but I wanted to avoid getting my hands dirty with anything
> schema wise. I've found another person with another similar situation:-
>
> https://github.com/2ndQuadrant/pglogical/issues/325
>
>


Re: Legal disclaimers on emails to this group

2019-12-06 Thread Tim Cross


Craig James  writes:

> (I've changed the original subject, "autovacuum locking question", of the
> sender's email so as not to hijack that thread.)
>
> On Thu, Dec 5, 2019 at 2:26 PM Mike Schanne  wrote:
>
>> Hi,
>>
>> I am investigating a performance problem...
>> ... This email is non-binding, is subject to contract, and neither Kulicke
>> and Soffa Industries, Inc. nor its subsidiaries (each and collectively
>> “K&S”) shall have any obligation to you to consummate the transactions
>> herein or to enter into any agreement, other than in accordance with the
>> terms and conditions of a definitive agreement if and when negotiated,
>> finalized and executed between the parties. This email and all its contents
>> are protected by International and United States copyright laws. Any
>> reproduction or use of all or any part of this email without the express
>> written consent of K&S is prohibited.
>>
>
> Sorry to be off topic, but this bugs me. Language is important. This isn't
> directed at you specifically, but I see these disclaimers all the time. How
> can you post to a public newsgroup that automatically reproduces your email
> to thousands of subscribers, and additionally publishes it on
> publicly accessible archives, in direct conflict with your company's policy
> appended to your email? And why on Earth do your company's lawyers think
> this sort of disclaimer is helpful and even legally useful? Not to mention,
> do they realize it's vaguely offensive to every customer and colleague who
> receives it?
>
> Craig

Oh how I hear you!

This is what I was using as my email signature (but not for groups). I
feel for the OP who probably has little choice (other than work for a
different employer, which is a very valid choice given the 'organisational
culture' exhibited by policies requiring such nonsense)

Notice to all senders:

If you send me a message, on receipt of that message I consider that message to
be my property and I will copy, share and deceminate as I see fit. I will
provide attribution when appropriate and I willl endeavour to comply with all
reasonable requests. However, I reject all threats or implied threats of legal
action arising from an error or mistake on your part. It is your responsibility
to manage your communications appropriately, not mine.

-- 
Tim Cross




Re: Performance of INSERT into temporary tables using psqlODBC driver

2018-09-15 Thread Tim Cross


padusuma  writes:

> Hello Tim,
>
>>How are you gathering metrics to determine if performance has improved
>>or not?
> I am measuring the response times through timer for the execution of SQL
> statements through psqlODBC driver. The response times for INSERT INTO
> temp-table statements have not changed with the parameters I modified.
>
>>Have you seen any change in your explain (analyze, buffers) plans?
>
> There was no change in the EXPLAIN for INSERT INTO statement, but the
> performance of the queries improved by about 5%.
>
>>Make sure your table statistics are all up-to-date before performing
>>each benchmark test. I often turn off autovacuum when doing this sort of
>>testing so that I know exactly when tables get vacuumed and statistics
>>get updated (just ensure you remember to turn it back on when your
>>finished!).
> I ran the VACUUM ANALYZE statement manually before starting the tests. Even
> though autovacuum was turned on, it did not get invoked due to the
> thresholds and as bulk of the inserts are in temporary tables.
>
>>Are the wal checkpoints being triggered every 30 mins or more
>>frequently?
> The wal checkpoints are triggered every 30 mins.
>
>>Are you still seeing the system use lots of temp files?
> I do not see any files in pgsql_tmp folders in the tablespaces where the
> tables are created. Also, I do not see pgsql_tmp folder in base and global
> folders. Am I checking for these files in the correct location? Also, I ran
> the following query (taken from another forum) to check the temporary files
> generated for all the databases:
> SELECT temp_files AS "Temporary files", temp_bytes AS "Size of temporary
> files" FROM pg_stat_database db;
>
> The result is 0 for both columns.
>
>>Do you have any indexes on the tables your inserting into?
> I have not created indexes on these temporary tables, but programatically
> executed /ANALYZE / statement after the data is inserted into
> these temp tables, to generate/update statistics for these tables. Indexes
> do exist for all regular tables.
>
>>As mentioned previously, there are no simple/quick fixes here - you
>>cannot just change a setting and see performance improve. It will be
>>necessary to do a lot of experimentation, gathering statistics and
>>investigate how postgres is using buffers, disk IO etc. All of these
>>parameters interact with each other, so it is critical you have good
>>metrics to see exactly what your changes do. It is complex and time
>>consuming. Highly recommend PostgreSQL: High Performance (Ahmed & SMith)
>>and Mastering Postgres (Shonig) for valuable background/tips - there
>>really is just far too much to communicate effectively via email.
>
> Thank you for the suggestions on the books. I will go through these.

Based on your responses, it sounds like you have done the 'easy' stuff
which often results in improved performance. Now you are going to have
to dig much harder. It might be worth looking more closely at how
buffers/caching is working (pg_buffercache extension might be useful),
verifying where performance bottlenecks are (this can sometimes be
surprising - it may not be where you think it is. Don't forget to
profile your client, network/driver throughput, OS level disk I/O
etc). This is where books like PosgreSQL High Performance will be
useful.

My only word of caution is that you are likely to now begin looking at
options which can improve throughput, but often come with other 'costs',
such as stability, data integrity or recovery time. These are things
which can only be assessed on a per case basis and largely depend on
business priorities. It will take time and you will need to make changes
slowly and do a lot of benchmarking.

It is really important to have a clear idea as to what would be
acceptable performance rather than just a vague concept of making things
faster. For example, one application I have inserts 1.3+ billion rows
per day. This represents two 'sets' of data. Our minimum requirement was
the ability to process 1 set, but if possible, 2 sets would be
ideal. Initially, with the original technology being used, it took
between 23 and 26 hours to process 1 set. We were able to tune this to
get it always to be under 24 hours, but there was no way we were going
to get the level of improvement which would allow more than 1 set to be
processed per day - not with the technology and design that was in
place.

A decision was made to re-implement using a different technology and
design. This was where we gained the improvements in performance we
really required. While the technology did play a part, it was really the
re-design which gave us the performance improvement to reach our desired
goal of 2 sets per day. 

Re: Performance of INSERT into temporary tables using psqlODBC driver

2018-09-13 Thread Tim Cross


padusuma  writes:

> Hello Tim,
>
> I have tried the suggestions provided to the best of my knowledge, but I did
> not see any improvement in the INSERT performance for temporary tables. The
> Linux host on which PostgreSQL database is installed has 32 GB RAM.
> Following are current settings I have in postgresql.conf file:
> shared_buffers = 8GB
> temp_buffers = 256MB
> work_mem = 256MB
> maintenance_work_mem = 256MB
> wal_buffers = 256MB
>
> checkpoint_timeout = 30min
> checkpoint_completion_target = 0.75
> max_wal_size = 1GB
>
> effective_cache_size = 16GB
>
>>>- Increase work_mem to reduce use of temp files. Need it to be 2 to 3
>>>  times largest temp file (but use common sense)
>
>>I have already increased the work_mem and maintenance_work_mem to 256MB. I
>>will check on the temp file sizes and adjust the work_mem parameter as you
>>suggested.
>
>>- Tweak wal checkpoint parameters to prevent wal checkpoints occurring
>>  too frequently. Note that there is a play off here between frequency
>>  of checkpoints and boot time after a crash. Fewer wal checkpoints will
>>  usually improve performance, but recovery time is longer.
>
>>How effectively you can increase insert times will depend on what the
>>memory and cpu profile of the system is. More memory, less use of temp
>>files, faster system, so spend a bit of time to make sure your system is
>>configured to squeeze as much out of that RAM as you can!
>
> Please let me know if there are any other suggestions that I can try.

How are you gathering metrics to determine if performance has improved
or not?

Have you seen any change in your explain (analyze, buffers) plans?

Make sure your table statistics are all up-to-date before performing
each benchmark test. I often turn off autovacuum when doing this sort of
testing so that I know exactly when tables get vacuumed and statistics
get updated (just ensure you remember to turn it back on when your
finished!).

Are the wal checkpoints being triggered every 30 mins or more
frequently?

Are you still seeing the system use lots of temp files?

Do you have any indexes on the tables your inserting into?

As mentioned previously, there are no simple/quick fixes here - you
cannot just change a setting and see performance improve. It will be
necessary to do a lot of experimentation, gathering statistics and
investigate how postgres is using buffers, disk IO etc. All of these
parameters interact with each other, so it is critical you have good
metrics to see exactly what your changes do. It is complex and time
consuming. Highly recommend PostgreSQL: High Performance (Ahmed & SMith)
and Mastering Postgres (Shonig) for valuable background/tips - there
really is just far too much to communicate effectively via email.

Tim


--
Tim Cross



Re: Performance of INSERT into temporary tables using psqlODBC driver

2018-09-09 Thread Tim Cross


padusuma  writes:

>>We are inserting large numbers (millions) of rows into a postgres
>>database from a Javascript application and found using the COPY command
>>was much, much faster than doing regular inserts (even with multi-insert
>>commit). If you can do this using the driver you are using, that will
>>give you the largest performance boost.
>
> The data to be inserted into temporary tables is obtained from one or more
> queries run earlier and the data is available as a vector of strings. If I
> need to use COPY FROM command, then the application would need to create a
> file with the data to be inserted and the file needs to be readable by the
> user running database server process, which may not be always possible
> unless the application is running on the same host. I think this approach
> may not be feasible for our application.
>

OK, that does make a difference. If your data is already in the
database, COPY is not going to help you much.

> I have increased the value for /temp_buffers/ server parameter from the
> default 8 MB to 128 MB. However, this change did not affect the INSERT time
> for temporary tables.

It isn't clear why you create vectors of strings rather than just select
into or something similar.

There are no 'quick fixes' which can be applied without real analysis of
the system. However, based on the limited information available, you may
want to consider -

- Increase work_mem to reduce use of temp files. Need it to be 2 to 3
  times largest temp file (but use common sense)

- Tweak wal checkpoint parameters to prevent wal checkpoints occurring
  too frequently. Note that there is a play off here between frequency
  of checkpoints and boot time after a crash. Fewer wal checkpoints will
  usually improve performance, but recovery time is longer.

- Verify your inserts into temporary tables is the bottleneck and not
  the select from existing data (explain plan etc and adjust indexes
  accordingly).

How effectively you can increase insert times will depend on what the
memory and cpu profile of the system is. More memory, less use of temp
files, faster system, so spend a bit of time to make sure your system is
configured to squeeze as much out of that RAM as you can!

--
Tim Cross



Re: Performance of INSERT into temporary tables using psqlODBC driver

2018-09-07 Thread Tim Cross


padusuma  writes:

> I am working on adding support for PostgreSQL database for our application.
> In a lot of our use-cases, data is inserted into temporary tables using
> INSERT INTO statements with bind parameters, and subsequently queries are
> run by joining to these temp tables. Following is some of the data for these
> INSERT statements:
>
> Table definition: CREATE TEMPORARY TABLE Table1( auid varchar(15) ) ON
> COMMIT DELETE ROWS;
>
> SQL statement: INSERT INTO Table1 (uidcol) VALUES (:1);
>
> Time taken to insert 24428 rows: 10.077 sec
> Time taken to insert 32512 rows: 16.026 sec
> Time taken to insert 32512 rows: 15.821 sec
> Time taken to insert  6107 rows: 1.514 sec
>
> I am looking for suggestions to improve the performance of these INSERT
> statements into temporary tables. Database is located on a Linux VM and the
> version is "PostgreSQL 10.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC)
> 4.4.7 20120313 (Red Hat 4.4.7-18), 64-bit". The application is running on a
> windows platform and connecting to the database using psqlODBC driver
> version 10.03.
>

We are inserting large numbers (millions) of rows into a postgres
database from a Javascript application and found using the COPY command
was much, much faster than doing regular inserts (even with multi-insert
commit). If you can do this using the driver you are using, that will
give you the largest performance boost. 


-- 
Tim Cross