Hello,
I had the same issue before and I used the PostgreSQL statistics to see weather the tables are used or not. One thing that I could not solve is how to check if the schema design and semantics are good. i.e. table a references table b, table c references table b, and table c reference table a. In some cases I find something like loops and circles in other cases, I find the same table referenced many times in other tables in the same schema. Any way, here are my findings regarding how to clean up your data. 1. Checking the number of sequential and indexed access to the table gives a good hint if the table is in use or deprecated. The following select statement retrieve the tables that might be deprecated. Select relname from pg_stat_user_tables WHERE (idx_tup_fetch + seq_tup_read)= 0; -- you can define a threshold here 2. Empty tables can be retrieved by checking the number of live tup i.e Select relname from pg_stat_user_tables WHERE n_live_tup = 0; 3. column can be checked using the null fraction in see http://www.postgresql.org/docs/8.3/static/view-pg-stats.html 4. use pg_constraints to determine the tables that depends on the above tables 5. table duplicates i.e the table can be found in more than one schema SELECT n.nspname as "Schema", c.relname as "Name" FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relname IN (SELECT relname FROM pg_catalog.pg_class WHERE relkind IN ('r') GROUP BY relname Having count(relname) > 1) ORDER BY 2,1; 6. For views there is no statistics, an easy way is to parse the log file using regular expressions and shell scripting and compare the result with the list of views and tables , I did that and I get many deprecated view. 7. for duplicate data have a look on this query. -- if you need table to find an exact duplicate replace <col1 >, ... <coln > with table name SELECT <col1 >, ... <coln >, min(ctid) AS keep, count(*) FROM <table> GROUP BY <col1 >, ... <coln > HAVING count(*) > 1 --- The above code snippet can be combined with delete statement to delete duplicate rows Have fun ________________________________ From: Jason Long <mailing.li...@octgsoftware.com> To: Guillaume Lelarge <guilla...@lelarge.info> Cc: Craig Ringer <ring...@ringerc.id.au>; pgsql-general@postgresql.org Sent: Friday, September 30, 2011 12:12 AM Subject: Re: [GENERAL] Identifying old/unused views and table On Wed, 2011-09-28 at 08:52 +0200, Guillaume Lelarge wrote: > On Wed, 2011-09-28 at 09:04 +0800, Craig Ringer wrote: > > On 09/28/2011 04:51 AM, Jason Long wrote: > > > I have an application with a couple hundred views and a couple hundred > > > tables. > > > > > > Is there some way I can find out which views have been accessed in the > > > last 6 months or so? Or some way to log this? > > > > > > I know there are views and tables that are no longer in used by my > > > application and I am looking for a way to identify them. > > > > Look at the pg_catalog.pg_stat* tables > > > > I fail to see how that gives him any answer on the views, and tables no > longer used. AFAICT, there's no way to know for views (apart from > logging all queries in the log). As for tables, still apart from the > log, pg_stat_user_tables could give an answer if he was monitoring it at > least the last six months. > > Thanks for the replies. Views were my main problem. My application could use some cleanup. Doing is manually is probably the best approach. I was just looking for a jump start. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general