Hello,

I had the same issue before and I used the  PostgreSQL statistics to 
see weather the tables are used or not. One thing that I could not solve is how 
to check if the schema design and  semantics are good. i.e. table a references 
table b,  table c references table b, and table  c reference table a. In some 
cases I find something like loops and circles in other cases, I find the same 
table referenced many times in other tables in the same schema. Any way,  here 
are my findings regarding how to clean up your data. 

1. Checking the number of sequential and indexed access to the table gives a 
good hint if the table is in use or deprecated.  The following select statement 
retrieve the tables that might be deprecated.

Select relname from
pg_stat_user_tables
WHERE (idx_tup_fetch + seq_tup_read)= 0; -- you can define a threshold here

2. Empty tables can be retrieved by checking the number of live tup i.e
Select relname from
pg_stat_user_tables
WHERE n_live_tup = 0; 
3.  column can be checked using the null fraction 
in see http://www.postgresql.org/docs/8.3/static/view-pg-stats.html

4.  use  pg_constraints to determine the tables that depends on the above 
tables 

5. table duplicates i.e the table can be found in more than one schema 

SELECT
n.nspname as "Schema",
c.relname as "Name" FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname IN (SELECT relname FROM pg_catalog.pg_class
WHERE relkind IN ('r')
GROUP BY relname
Having count(relname) > 1)
ORDER BY 2,1;

6. For views there is no statistics, an easy way is to parse the log file 
using regular expressions and shell scripting and compare the result with the 
list of views and tables , I did that and I get many deprecated view. 

7. for duplicate data have a look on this query. 
-- if you need table to find an exact duplicate replace <col1 >, ...  <coln > 
with table name SELECT <col1 >, ...  <coln >, min(ctid) AS keep, count(*) FROM 
<table> GROUP BY  <col1 >, ...  <coln > HAVING count(*) > 1 --- The above code 
snippet can be combined with delete statement to delete duplicate rows


Have fun 

________________________________



From: Jason Long <mailing.li...@octgsoftware.com>
To: Guillaume Lelarge <guilla...@lelarge.info>
Cc: Craig Ringer <ring...@ringerc.id.au>; pgsql-general@postgresql.org
Sent: Friday, September 30, 2011 12:12 AM
Subject: Re: [GENERAL] Identifying old/unused views and table

On Wed, 2011-09-28 at 08:52 +0200, Guillaume Lelarge wrote:
> On Wed, 2011-09-28 at 09:04 +0800, Craig Ringer wrote:
> > On 09/28/2011 04:51 AM, Jason Long wrote:
> > > I have an application with a couple hundred views and a couple hundred
> > > tables.
> > >
> > > Is there some way I can find out which views have been accessed in the
> > > last 6 months or so?  Or some way to log this?
> > >
> > > I know there are views and tables that are no longer in used by my
> > > application and I am looking for a way to identify them.
> > 
> > Look at the pg_catalog.pg_stat* tables
> > 
> 
> I fail to see how that gives him any answer on the views, and tables no
> longer used. AFAICT, there's no way to know for views (apart from
> logging all queries in the log). As for tables, still apart from the
> log, pg_stat_user_tables could give an answer if he was monitoring it at
> least the last six months.
> 
> 

Thanks for the replies.  Views were my main problem.  My application
could use some cleanup.  Doing is manually is probably the best
approach.  I was just looking for a jump start.  



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to