Re: Disk wait problem... may not be hardware...

2023-10-29 Thread pf
On Sun, 29 Oct 2023 16:00:46 +0100 Peter J. Holzer wrote:

>On 2023-10-27 19:46:09 -0400, p...@pfortin.com wrote:
>> On Fri, 27 Oct 2023 19:07:11 +0200 Peter J. Holzer wrote:  
>> >Have you looked at the query plans as I recommended? (You might also
>> >want to enable track_io_timing to get extra information, but comparing
>> >just the query plans of fast and slow queries would be a first step)  
>> 
>> I didn't see how that would help since other than the table name the
>> queries are identical.  Curious: are you implying PG stores tables
>> differently?  
>
>No, but Postgres decides on the query depending on the statistics stored
>about that table. If those statistics are off, the query plan can be
>wildly different and very inefficient. So checking whether the plans are
>plausible should be one of the first things you do when performance is
>not what you expect. Indeed, on
>https://wiki.postgresql.org/wiki/Slow_Query_Questions it is the very
>first bullet point in the section "Things to Try Before You Post".
>
>When you have established that the plan looks fine even when the
>performance is poor then you have to look elsewhere. But even then it
>helps to know what the database is actually trying to do.
>
>hp

Peter,

Thanks for this and the link... Thanks to Jim's post, I came to a similar
conclusion; but the above link could use a bit more emphasis on
shared_buffers.  I'm no longer enabling "autoload table row count" and
pointing out to our team that SELECT count(*) FROM table; is OK *unless*
done in a loop across many tables where shared-buffers may never be
enough to prevent thrashing...

Thanks,
Pierre




Re: Disk wait problem... may not be hardware...

2023-10-29 Thread Peter J. Holzer
On 2023-10-27 19:46:09 -0400, p...@pfortin.com wrote:
> On Fri, 27 Oct 2023 19:07:11 +0200 Peter J. Holzer wrote:
> >Have you looked at the query plans as I recommended? (You might also
> >want to enable track_io_timing to get extra information, but comparing
> >just the query plans of fast and slow queries would be a first step)
> 
> I didn't see how that would help since other than the table name the
> queries are identical.  Curious: are you implying PG stores tables
> differently?

No, but Postgres decides on the query depending on the statistics stored
about that table. If those statistics are off, the query plan can be
wildly different and very inefficient. So checking whether the plans are
plausible should be one of the first things you do when performance is
not what you expect. Indeed, on
https://wiki.postgresql.org/wiki/Slow_Query_Questions it is the very
first bullet point in the section "Things to Try Before You Post".

When you have established that the plan looks fine even when the
performance is poor then you have to look elsewhere. But even then it
helps to know what the database is actually trying to do.

hp



-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Disk wait problem... may not be hardware...

2023-10-28 Thread Jim Mlodgenski
On Fri, Oct 27, 2023 at 7:46 PM  wrote:

> Memory: 125.5 GiB of RAM
>
It looks like you have a large amount of memory allocated to the server

But your plans are doing reads instead of pulling things from shared
buffers

>explain (analyze, buffers) select count(*) from ncvhis_2016_12_03;
> Finalize Aggregate  (cost=404669.65..404669.66 rows=1 width=8) (actual
> time=844.158..847.309 rows=1 loops=1) Buffers: shared hit=248 read=25022
>   ->  Gather  (cost=404669.43..404669.65 rows=2 width=8) (actual
> time=844.133..847.301 rows=3 loops=1) Workers Planned: 2
> Workers Launched: 2
> Buffers: shared hit=248 read=25022
> ->  Partial Aggregate  (cost=403669.43..403669.45 rows=1 width=8)
> (actual time=838.772..838.772 rows=1 loops=3) Buffers: shared hit=248
> read=25022 ->  Parallel Index Only Scan using
> ncvhis_2016_12_03_voted_party_cd_idx on ncvhis_2016_12_03
> (cost=0.44..372735.05 rows=12373755 width=0) (actual time=18.277..592.473
> rows=9900389 loops=3) Heap Fetches: 0 Buffers: shared hit=248 read=25022
> Planning Time: 0.069 ms JIT:
>   Functions: 8
>   Options: Inlining false, Optimization false, Expressions true,
> Deforming true Timing: Generation 0.284 ms, Inlining 0.000 ms,
> Optimization 0.268 ms, Emission 3.590 ms, Total 4.143 ms Execution Time:
> 847.498 ms
>
>


data/postgresql.conf:
> max_connections = 100
> shared_buffers = 128MB
>

It looks like you are running with the stock config for shared_buffers.
With only 128MB dedicated for shared_buffers and such a big database,
you'll be thrashing the cache pretty hard. With 125GB on the server, try
upping shared_buffers to something more like 32GB.


Re: Disk wait problem... may not be hardware...

2023-10-28 Thread pf
On Fri, 27 Oct 2023 21:21:18 -0700 Adrian Klaver wrote:

>On 10/27/23 16:46, p...@pfortin.com wrote:
>> Peter,
>> 
>> Thanks for your patience; I've been feeling pressure to get this
>> resolved; so have been lax in providing info here..  Hope the following
>> helps...
>>   
>
>
>> Something I hadn't noticed before: SQL-workbench/J (build 129.6) displays
>> an execution timer at the bottom of the window.  I see all the queries
>> reporting 0-47ms which it plenty fast. It's the results that often take a
>> long time to display.  
>
>Which is not surprising as building GUI elements is an expensive 
>operation. 

True; but not to this extent (minutes).

>If the time to display is your wait issue then this is not 
>really a disk issue. 

Guessing you didn't look at attachment "journal2"... it clearly shows
data from the disk coming out VERY slowly...

>What happens if you use psql as the client?

I see my reply on the 23rd only went to you (I'm still forgetting to
check which message I'm replying to -- this is unlike the other lists I
participate in, where the headers are set to force all messages through
the list server), my reply contained the psql info where psql shows long
delays too:

Hi Adrian,

Sorry about everything inline; I think it saves time vs dealing with
attachments...

Summary:
* single large partition dedicated to PG
* only PG seems affected
* buffered reads are 6x slower when PG queries are in progress
* psql queries of 30  33M [row] tables shows random results up to 4m51s
* successful psql queries below 500ms
* ls -l loop on another directory had no delays.

HTH...

The WB query was in disk-wait, so psql gave:
ncsbe=# select count(*) from ncvhis_2022_10_29;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

The above query finished a second or two before the WB results
appeared; then this next query succeeded right after:
ncsbe=# select count(*) from
ncvhis_2022_10_29; count   
--
 33861176
(1 row)

Another table went into disk-wait long enough that I was able to query
the tmp directory on the same disk just fine. 

Querying the same table; no error, canceled it:
ncsbe=# select count(*) from ncvhis_2022_07_23;
^CCancel request sent
ERROR:  canceling statement due to user request

While this table was still in disk-wait, tried another table which
returned quickly:
ncsbe=# select count(*) from ncvhis_2023_10_21;
  count   
--
 32983343
(1 row)

Eventually, the WB query did complete...

Next, I made a query list for 30 tables, turned on timing and pasted the
queries.  Pasting results inline:
ncsbe=# select count(*) from ncvhis_2012_12_06;
select count(*) from ncvhis_2016_12_03;
select count(*) from ncvhis_2020_08_01;
select count(*) from ncvhis_2020_10_31;
select count(*) from ncvhis_2020_12_05;
select count(*) from ncvhis_2020_12_26;
select count(*) from ncvhis_2021_03_06;
select count(*) from ncvhis_2021_06_12;
select count(*) from ncvhis_2022_07_23;
select count(*) from ncvhis_2022_10_15;
select count(*) from ncvhis_2022_10_22;
select count(*) from ncvhis_2022_10_29;
select count(*) from ncvhis_2022_11_05;
select count(*) from ncvhis_2022_11_12;
select count(*) from ncvhis_2022_11_19;
select count(*) from ncvhis_2022_11_26;
select count(*) from ncvhis_2022_12_03;
select count(*) from ncvhis_2022_12_10;
select count(*) from ncvhis_2022_12_17;
select count(*) from ncvhis_2022_12_24;
select count(*) from ncvhis_2022_12_31;
select count(*) from ncvhis_2023_01_08;
select count(*) from ncvhis_2023_01_14;
select count(*) from ncvhis_2023_01_21;
select count(*) from ncvhis_2023_01_28;
select count(*) from ncvhis_2023_02_04;
select count(*) from ncvhis_2023_02_11;
select count(*) from ncvhis_2023_02_18;
select count(*) from ncvhis_2023_02_27;
select count(*) from ncvhis_2023_03_04;
  count   
--
 31923950
(1 row)

Time: 72404.786 ms (01:12.405)
  count   
--
 29701168
(1 row)

Time: 301.246 ms
  count   
--
 32172845
(1 row)

Time: 409.974 ms
  count   
--
 32162748
(1 row)

Time: 363.836 ms
  count   
--
 34679651
(1 row)

Time: 351.167 ms
  count   
--
 35006711
(1 row)

Time: 348.378 ms
  count   
--
 35003995
(1 row)

Time: 348.712 ms
  count   
--
 34994502
(1 row)

Time: 351.901 ms
  count   
--
 33686292
(1 row)

Time: 487.837 ms
  count   
--
 33861658
(1 row)

Time: 40987.826 ms (00:40.988)
  count   
--
 33861381
(1 row)

Time: 76964.281 ms (01:16.964)
  count   
--
 33861176
(1 row)

Time: 483.329 ms
  count   
--
 33861071
(1 row)

Time: 18919.267 ms (00:18.919)
  count   
--
 29344354
(1 row)

Time: 50896.978 ms (00:50.897)
  count   
--
 30709227
(1 row)

Time: 25784.000 ms (00:25.784)
  count   
--
 32368001
(1 row)


Re: Disk wait problem... may not be hardware...

2023-10-27 Thread Adrian Klaver

On 10/27/23 16:46, p...@pfortin.com wrote:

Peter,

Thanks for your patience; I've been feeling pressure to get this
resolved; so have been lax in providing info here..  Hope the following
helps...





Something I hadn't noticed before: SQL-workbench/J (build 129.6) displays
an execution timer at the bottom of the window.  I see all the queries
reporting 0-47ms which it plenty fast. It's the results that often take a
long time to display.


Which is not surprising as building GUI elements is an expensive 
operation. If the time to display is your wait issue then this is not 
really a disk issue. What happens if you use psql as the client?



--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Disk wait problem... may not be hardware...

2023-10-27 Thread pf
Peter, 

Thanks for your patience; I've been feeling pressure to get this
resolved; so have been lax in providing info here..  Hope the following
helps...

On Fri, 27 Oct 2023 19:07:11 +0200 Peter J. Holzer wrote:

>On 2023-10-26 22:03:25 -0400, p...@pfortin.com wrote:
>> Are there any extra PG low level logs that can be turned on?  

$ uname -a
Linux pf.pfortin.com 6.5.3-server-1.mga10 #1 SMP PREEMPT_DYNAMIC
   Sat Sep 16 00:04:28 UTC 2023 x86_64 GNU/Linux

PostgreSQL 15.4 on x86_64-mageia-linux-gnu, compiled by gcc (Mageia
12.3.0-3.mga9) 12.3.0, 64-bit

Operating System: Mageia 10 KDE Plasma
Version: 5.27.8 KDE Frameworks Version: 5.111.0
Qt Version: 5.15.7
Kernel Version: 6.5.3-server-1.mga10 (64-bit)
Graphics Platform: X11
Processors: 20 × 12th Gen Intel® Core™ i7-12700K
Memory: 125.5 GiB of RAM
Graphics Processor: AMD Radeon RX 6600 XT
Manufacturer: Dell Inc.
Product Name: XPS 8950

>Have you looked at the query plans as I recommended? (You might also
>want to enable track_io_timing to get extra information, but comparing
>just the query plans of fast and slow queries would be a first step)

I didn't see how that would help since other than the table name the
queries are identical.  Curious: are you implying PG stores tables
differently?  The tables are probably greater than 90-95% identical rows.
We're still investigating how to best store that type of data which
changes a small number of rows each week...

The query is the built-in row-count query in SQL-workbench/J which is
visible in the attached files.

>explain (analyze, buffers) select count(*) from ncvhis_2016_12_03;
Finalize Aggregate  (cost=404669.65..404669.66 rows=1 width=8) (actual
time=844.158..847.309 rows=1 loops=1) Buffers: shared hit=248 read=25022
  ->  Gather  (cost=404669.43..404669.65 rows=2 width=8) (actual
time=844.133..847.301 rows=3 loops=1) Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=248 read=25022
->  Partial Aggregate  (cost=403669.43..403669.45 rows=1 width=8)
(actual time=838.772..838.772 rows=1 loops=3) Buffers: shared hit=248
read=25022 ->  Parallel Index Only Scan using
ncvhis_2016_12_03_voted_party_cd_idx on ncvhis_2016_12_03
(cost=0.44..372735.05 rows=12373755 width=0) (actual time=18.277..592.473
rows=9900389 loops=3) Heap Fetches: 0 Buffers: shared hit=248 read=25022
Planning Time: 0.069 ms JIT:
  Functions: 8
  Options: Inlining false, Optimization false, Expressions true,
Deforming true Timing: Generation 0.284 ms, Inlining 0.000 ms,
Optimization 0.268 ms, Emission 3.590 ms, Total 4.143 ms Execution Time:
847.498 ms

>explain (analyze, buffers) select count(*) from ncvhis_2020_08_01;
Finalize Aggregate  (cost=438377.94..438377.95 rows=1 width=8) (actual
time=624.700..628.024 rows=1 loops=1) Buffers: shared hit=286 read=27084
  ->  Gather  (cost=438377.73..438377.94 rows=2 width=8) (actual
time=624.669..628.016 rows=3 loops=1) Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=286 read=27084
->  Partial Aggregate  (cost=437377.73..437377.74 rows=1 width=8)
(actual time=619.297..619.297 rows=1 loops=3) Buffers: shared hit=286
read=27084 ->  Parallel Index Only Scan using
ncvhis_2020_08_01_voted_party_cd_idx on ncvhis_2020_08_01
(cost=0.44..403856.78 rows=13408379 width=0) (actual time=0.029..357.697
rows=10724282 loops=3) Heap Fetches: 0 Buffers: shared hit=286 read=27084
Planning Time: 0.072 ms JIT:
  Functions: 8
  Options: Inlining false, Optimization false, Expressions true,
Deforming true Timing: Generation 0.285 ms, Inlining 0.000 ms,
Optimization 0.218 ms, Emission 3.933 ms, Total 4.435 ms Execution Time:
628.216 ms

>explain (analyze, buffers) select count(*) from ncvhis_2020_10_31;
Finalize Aggregate  (cost=438179.73..438179.74 rows=1 width=8) (actual
time=1090.209..1092.976 rows=1 loops=1) Buffers: shared hit=276 read=27095
  ->  Gather  (cost=438179.52..438179.73 rows=2 width=8) (actual
time=1090.141..1092.967 rows=3 loops=1) Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=276 read=27095
->  Partial Aggregate  (cost=437179.52..437179.53 rows=1 width=8)
(actual time=1084.766..1084.767 rows=1 loops=3) Buffers: shared hit=276
read=27095 ->  Parallel Index Only Scan using
ncvhis_2020_10_31_voted_party_cd_idx on ncvhis_2020_10_31
(cost=0.44..403675.53 rows=13401595 width=0) (actual
time=222.686..816.447 rows=10720916 loops=3) Heap Fetches: 0 Buffers:
shared hit=276 read=27095 
Planning Time: 0.131 ms JIT:
  Functions: 8
  Options: Inlining false, Optimization false, Expressions true,
Deforming true Timing: Generation 0.322 ms, Inlining 0.000 ms,
Optimization 0.261 ms, Emission 3.648 ms, Total 4.231 ms Execution Time:
1093.209 ms



Those responses are reasonably quick...  A suggestion from the Linux list
was to run "fstrim -v /mnt/work" which is the NVMe mount point for the DB:
 fstrim -nv /mnt/work
/mnt/work: 0 B (dry run) trimmed

$ fstrim -v /mnt/work
/mnt/work: 2 TiB (2248993087488 bytes) trimmed


Re: Disk wait problem... may not be hardware...

2023-10-27 Thread Peter J. Holzer
On 2023-10-26 22:03:25 -0400, p...@pfortin.com wrote:
> Are there any extra PG low level logs that can be turned on?

Have you looked at the query plans as I recommended? (You might also
want to enable track_io_timing to get extra information, but comparing
just the query plans of fast and slow queries would be a first step)

You haven't shown any postgresql logs or your settings, so it's hard to
know what you have already turned on.

There are a ton of logging related parameters.

> I've only found logging in the journal;

By "the journal" you mean the one maintained by journald? (Did you
mention what OS you are using? From several outputs I'm guessing it is
Linux). On Linux systems postgresql is usually set up to log into files
somewhere under /var/log/postgresql (see the log_directory entry in
postgresql.conf) but your setup might be different, of course.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Disk wait problem... may not be hardware...

2023-10-27 Thread Justin Clift

On 2023-10-27 12:03, p...@pfortin.com wrote:


I can't think of a common hardware bus/other that would only affect
PostgreSQL disk accesses.


Which file system is PostgreSQL being run on?

Asking because I remember seeing weirdness reported with *some* SSD
drives when used with ZFS:

  https://github.com/openzfs/zfs/discussions/14793

Note - that's not PostgreSQL specific or anything, but more of a
"weird stuff showing up with NVMe drives" thing.

Regards and best wishes,

Justin Clift




Re: Disk wait problem... may not be hardware...

2023-10-26 Thread pf
On Thu, 26 Oct 2023 15:50:16 -0400 p...@pfortin.com wrote:

>Hi Peter,
>
>All of the following is based on using SQL_workbench/J (WB)  (Mageia Linux
>has not provided a viable pgAdmin4 package); WB is setup to autoload the
>table row count and 10 rows. I'm sticking to one set of files where they
>are all roughly 33.6M rows.
>
>I've been doing a lot of digging and found more interesting info...
>
>At first, I thought the disk was in disk wait the entire time of the
>delay; this _appeared_ to be the case since the system journal logged
>everything about each query, with every query looking the same; no errors.
>
>I was going to provide some logs; but suffice it to say that the logs,
>combined with running iostat every second has been enlightening...
>
>Every time I switch from one table to another, sometimes the results
>display in:
>* less than a half-second
>* one or more seconds
>* up to 5+ minutes.
>So far, none has failed to eventually return; just l..o..n..g delays...
>
>This morning, I've been using iostat:
>$ while true; do echo -n "$(date +%H:%M:%S)  "; iostat /dev/nvme1n1 |
>grep nvme1n1; sleep 1; done
>
>When the response is fast; I see iostat report ~225000 bytes read. On the
>other hand when the query takes a long time to respond, I see the read
>bytes spread over time...  a little every second...  So while htop
>appears to report a steady D (disk wait) status, I now suspect the short
>transfers are fast and most of the time, htop is going to catch the D
>state.
>
>It definitely appears to be a problem with the 4TB NVMe SSD drive which
>utilities say it still has 95% life left[1].  The drive is reporting:
>Error Information Log Entries:  1,177
>which is growing, so I'll be digging into that.  It also reports:
>
>Error Information (NVMe Log 0x01, 16 of 63 entries)
>Num   ErrCount  SQId   CmdId  Status  PELoc  LBA  NSIDVS  Message
>  0   1177 0  0xc002  0x4004  -0 1 -  Invalid 
> Field in Command
>  1   1176 0  0xb001  0x4004  -0 1 -  Invalid 
> Field in Command
>  2   1175 0  0x801a  0x4004  -0 1 -  Invalid 
> Field in Command
>...
>This list is not growing, so I'll look into it later than sooner.
>
>A few moments ago, I decided to switch to the backup which resides on a
>17TB mechanical platter drive which is running great.
>
>[1] SSD drives wear out with usage; think of it as a vehicle tire which
>slowly wears out.
>
>Also found that there is a firmware update available from:
>https://www.seagate.com/support/downloads/
>
>No idea if this will solve the problem; but...
>
>Thanks for your suggestions.
>
>Thought I saw there may be instrumentation within PG which I can access;
>suggestions welcome if so...
>
>Cheers,
>Pierre
>
>
>On Thu, 26 Oct 2023 11:50:00 +0200 Peter J. Holzer wrote:
>
>>On 2023-10-23 16:31:30 -0700, Adrian Klaver wrote:  
>>> On 10/23/23 14:55, p...@pfortin.com wrote:
>>> > Next, I made a query list for 30 tables, turned on timing and pasted the
>>> > queries.  Pasting results inline:
>>> > ncsbe=# select count(*) from ncvhis_2012_12_06;
>>> > select count(*) from ncvhis_2016_12_03;
>>> > select count(*) from ncvhis_2020_08_01;
>>> > select count(*) from ncvhis_2020_10_31;
>>[...]  
>>> > Time: 72404.786 ms (01:12.405)
>>> > Time: 301.246 ms
>>> > Time: 409.974 ms
>>> > Time: 363.836 ms
>>> > Time: 351.167 ms
>>> > Time: 348.378 ms
>>> > Time: 348.712 ms
>>> > Time: 351.901 ms
>>> > Time: 487.837 ms
>>> > Time: 40987.826 ms (00:40.988)
>>> > Time: 76964.281 ms (01:16.964)
>>> > Time: 483.329 ms
>>> > Time: 18919.267 ms (00:18.919)
>>> > Time: 50896.978 ms (00:50.897)
>>> > Time: 25784.000 ms (00:25.784)
>>> > Time: 45407.599 ms (00:45.408)
>>> > Time: 75171.609 ms (01:15.172)
>>> > Time: 88871.004 ms (01:28.871)
>>> > Time: 128565.127 ms (02:08.565)
>>> > Time: 210248.222 ms (03:30.248)
>>> > Time: 246609.561 ms (04:06.610)   4 minutes!  ;p
>>> > Time: 166406.283 ms (02:46.406)
>>> > Time: 291483.538 ms (04:51.484)  nearly 5 minutes ;p ;p
>>> > Time: 2269.961 ms (00:02.270)
>>> > Time: 413.219 ms
>>> > Time: 433.817 ms
>>> > Time: 395.844 ms
>>> > Time: 7251.637 ms (00:07.252)
>>> > Time: 384.328 ms
>>> > Time: 384.887 ms
>>> > 
>>> > I don't see a pattern in the above; matches the randomness I saw using
>>> > WB...
>>
>>It would be interesting to see whether there's a difference in plan
>>between the slow and the fast queries.
>>
>>If your experiment above is repeatable, you can just prefix every
>>query with 
>>explain (analyze, buffers)
>>
>>
>>explain (analyze, buffers) select count(*) from ncvhis_2016_12_03;
>>explain (analyze, buffers) select count(*) from ncvhis_2020_08_01;
>>explain (analyze, buffers) select count(*) from ncvhis_2020_10_31;
>>...
>>
>>If it is hard to reproduce (happens only sometimes), the best way to
>>catch is probably to enable auto_explain
>>, possibly
>>with the