Re: [GENERAL] large table

Luke Coldiron Mon, 22 Sep 2014 12:35:41 -0700

> Date: Mon, 22 Sep 2014 14:38:52 -0400
> From: wmo...@potentialtech.com
> To: lukecoldi...@hotmail.com
> CC: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] large table
> 
> On Mon, 22 Sep 2014 11:17:05 -0700
> Luke Coldiron <lukecoldi...@hotmail.com> wrote:
> 
> > I am trying to figure out what would have caused a table in a PostgreSQL 
> > 8.4.16 to get into a state where there is only 1 live tuple and has only 
> > ever had one 1 tuple but the size of the table is huge.
> > 
> > CREATE TABLE public.myTable(  myColumn timestamp with time zone NOT NULL);
> > 
> > Note: there is no primary key or index on this table.
> > CREATE OR REPLACE FUNCTION public.myFunc()RETURNS VOID AS $$BEGIN   UPDATE 
> > public.myTable SET myColumn = CLOCK_TIMESTAMP();
> >    IF NOT FOUND THEN      INSERT INTO public.myTable(myColumn) VALUES 
> > (CLOCK_TIMESTAMP());   END IF;END;$$ LANGUAGE plpgsql VOLATILE STRICT 
> > SECURITY DEFINER;
> > CREATE OR REPLACE FUNCTION public.wasUpdated(OUT was_updated 
> > boolean)RETURNS BOOLEAN AS $$BEGIN   was_updated := COALESCE((SELECT 
> > myColumn FROM public.myTable) > (CLOCK_TIMESTAMP() - INTERVAL '5 SECOND'),  
> >                                                    FALSE);END;$$ LANGUAGE 
> > plpgsql VOLATILE STRICT SECURITY DEFINER;
> > SELECT *FROM pg_stat_all_tablesWHERE relname = 'myTable';
> 
> [snip]
> 
> > The actual size of the table is around 33 MB.
> > The myFunc function is called every 2.5 seconds and the wasUpdated function 
> > every 2 seconds by separate processes. 
> > I realize that running a FULL VACUUM or CLUSTER command on the table will 
> > resolve the issue but I am more interested in a root cause that explains 
> > why this table would end up in this state. I have tried to reproduce this 
> > issue by running the exact setup and have not been able to get the table to 
> > grow like this example. Any plausible cause'es or explanations would be 
> > much appreciated.
>                                 
> The UPDATE in myFunc() creates a new row every 2.5 seconds when it updates 
> the row.
> The data from those rows is only reclaimed when a vacuum is run.  So (for 
> example)
> if autovacuum only triggers a vacuum every 250 seconds, there will be 249 
> rows worth
> of space in the table, on average.  With the other process querying the 
> table, it's
> possible that the row that it's looking at will be a something that _should_ 
> be
> reclaimable, so vacuum may not clear up all the free space.
> 
> As far as running the exact setup: if you're not getting the same results, 
> then
> your setup isn't exactly the same.  It's likely that there are things going 
> on in the
> setup you're curious about that you're not aware of, such as additional 
> queries on
> the table, additional load that causes operations to take a little longer, 
> thus
> resulting in different overlap of competing operations, etc.
It is possible and that is part of what I am trying to discover however I am 
very familiar with the system / code base and in this case there is a single 
process updating the timestamp and a single process reading the timestamp. 
There are no other user processes programmed to interact with this table 
outside of potentially what Postgres is doing.
> 
> Keep in mind that a short-lived incident might have resulted in table bloat 
> that
> won't be reclaimed by autovacuum.  I.e., if autovacuum wasn't running for a 
> while,
> this table would just keep bloating; then when you start autovacuum, it will
> maintain the table size, but it won't get any smaller.
I thought this as well and have run tests with autovacuum turned off and I 
don't see this issue occur over my 1000s of updates. The updates become hot 
updates and reuse dead tuples.
> 
> I can't make any sense of the data you provided, it's all on seperate rows 
> and I've
> given up on trying to figure out what number goes with which value, so I 
> don't know
> exactly what the situation is.  It's likely that you can improve on the 
> situation
> by tweaking the autovacuum settings for this table to vacuum it more 
> aggressively.
Sorry about that the email client that I am using messed up the formatting. 
Here is another attempt.
SELECT *


FROM pg_stat_all_tables

WHERE relname = 'myTable';
relid   schemaname      relname seq_scan        seq_tup_read    idx_scan        
idx_tup_fetch   n_tup_ins       n_tup_upd       n_tup_del       n_tup_hot_upd   
n_live_tup      n_dead_tup      last_vacuum     last_autovacuum last_analyze    
last_autoanalyze16713   public  myTable 3995023 3995296                 0       
3778598 0       3774362 949135  124             2014-09-18 11:28:47.63545+00    
        2014-09-18 11:27:47.134432+00
SELECT * FROM
pgstattuple('public.myTable');
table_len       tuple_count     tuple_len       tuple_percent   
dead_tuple_count        dead_tuple_len  dead_tuple_percent      free_space      
free_precent34709504    1       32      0       105     3360    0.01    
30757308        88.61
So far having autovacuum on or off has not caused the problem to occur. 
Originally I was thinking that having autovacuum off would make this happen for 
sure but since the table doesn't have an index it appears to be able to do a 
hot update.
> 
> Although, you don't seem to have a _problem_ that you've stated.  Are you 
> seeing
> performance issues?  Is 33M too much data and filling up the drive (not being
> sarcastic here, as there are various mobile applications where 33M could be
> important, even now).  Because, if this isn't actually causing any problems, I
> wouldn't really worry about it.
Yes, this is a performance issue. The query takes around 10 longer to perform 
that would normally be expected. Not a big surprise after looking at how large 
the table is and having to scan it for all the live tuples (1 tuple). The space 
is not so much of an issue but the query is being done on a embedded appliance 
that is sensitive to this dramatic of a change in performance.  For the future 
I will be redesigning this mechanism and it will be done outside of the 
database but before I do so I wanted to see if anyone could explain why this 
might have occurred possible bug that was fixed in PostgreSQL, etc.
> 
> -- 
> Bill Moran
> I need your help to succeed:
> http://gamesbybill.com

Re: [GENERAL] large table

Reply via email to