Hello,
I had the same issue before and I used the PostgreSQL statistics to
see weather the tables are used or not. One thing that I could not solve is how
to check if the schema design and semantics are good. i.e. table a references
table b, table c references table b, and table c reference table a. In some
cases I find something like loops and circles in other cases, I find the same
table referenced many times in other tables in the same schema. Any way, here
are my findings regarding how to clean up your data.
1. Checking the number of sequential and indexed access to the table gives a
good hint if the table is in use or deprecated. The following select statement
retrieve the tables that might be deprecated.
Select relname from
pg_stat_user_tables
WHERE (idx_tup_fetch + seq_tup_read)= 0; -- you can define a threshold here
2. Empty tables can be retrieved by checking the number of live tup i.e
Select relname from
pg_stat_user_tables
WHERE n_live_tup = 0;
3. column can be checked using the null fraction
in see http://www.postgresql.org/docs/8.3/static/view-pg-stats.html
4. use pg_constraints to determine the tables that depends on the above
tables
5. table duplicates i.e the table can be found in more than one schema
SELECT
n.nspname as Schema,
c.relname as Name FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname IN (SELECT relname FROM pg_catalog.pg_class
WHERE relkind IN ('r')
GROUP BY relname
Having count(relname) 1)
ORDER BY 2,1;
6. For views there is no statistics, an easy way is to parse the log file
using regular expressions and shell scripting and compare the result with the
list of views and tables , I did that and I get many deprecated view.
7. for duplicate data have a look on this query.
-- if you need table to find an exact duplicate replace col1 , ... coln
with table name SELECT col1 , ... coln , min(ctid) AS keep, count(*) FROM
table GROUP BY col1 , ... coln HAVING count(*) 1 --- The above code
snippet can be combined with delete statement to delete duplicate rows
Have fun
From: Jason Long mailing.li...@octgsoftware.com
To: Guillaume Lelarge guilla...@lelarge.info
Cc: Craig Ringer ring...@ringerc.id.au; pgsql-general@postgresql.org
Sent: Friday, September 30, 2011 12:12 AM
Subject: Re: [GENERAL] Identifying old/unused views and table
On Wed, 2011-09-28 at 08:52 +0200, Guillaume Lelarge wrote:
On Wed, 2011-09-28 at 09:04 +0800, Craig Ringer wrote:
On 09/28/2011 04:51 AM, Jason Long wrote:
I have an application with a couple hundred views and a couple hundred
tables.
Is there some way I can find out which views have been accessed in the
last 6 months or so? Or some way to log this?
I know there are views and tables that are no longer in used by my
application and I am looking for a way to identify them.
Look at the pg_catalog.pg_stat* tables
I fail to see how that gives him any answer on the views, and tables no
longer used. AFAICT, there's no way to know for views (apart from
logging all queries in the log). As for tables, still apart from the
log, pg_stat_user_tables could give an answer if he was monitoring it at
least the last six months.
Thanks for the replies. Views were my main problem. My application
could use some cleanup. Doing is manually is probably the best
approach. I was just looking for a jump start.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general