Re: [GENERAL] Strategies/Best Practises Handling Large Tables

2012-11-15 Thread Chitra Creta
Thanks for your example Chris. I will look into it as a long-term solution.

Partitioning tables as a strategy worked very well indeed. This will be my
short/medium term solution.

Another strategy that I would like to evaluate as a short/medium term
solution is archiving old records in a table before purging them.

I am aware that Oracle has a tool that allows records to be exported into a
file / archive table before purging them. They also provide a tool to
import these records.

Does PostgreSQL have similar tools to export to a file and re-import?

If PostgreSQL does not have a tool to do this, does anyone have any ideas
on what file format (e.g. text file containing a table of headers being
column names and rows being records) would be ideal for easy re-importing
into a PostgreSQL table?

Thank you for your ideas.


On Mon, Oct 22, 2012 at 12:14 AM, Chris Travers chris.trav...@gmail.comwrote:



 On Sun, Oct 21, 2012 at 5:46 AM, Chitra Creta chitracr...@gmail.comwrote:

 Thank you all.

 Ryan, would you mind sharing your one-time function to move it?

 Merlin, what are your suggestions to improve query performance?

 Shaun, thank you. I will look into facts and dimensions should all else
 fail.

 Chris, would you mind giving me an example of what you mean by your log,
 aggregate and snapshot approach. Also, with indexing, I believe composite
 and partial indexes are better than indexes, am I correct? Do you have any
 recommendations as to which type (e.g btree, hash) is better for which
 situations.


 Sure.  Suppose I have an accounting system.

 I may record the amounts in the transactions in a journal_entry and
 journal_line table.   These will be write once read many.  However time you
 will end up having to digest millions of records (given sufficient volume)
 to find out the balance of a checking account, and this is not really ideal.

 So to deal with this, I might, for example, add a table called
 account_checkpoint which might have the following fields:

 account_id
 end_date
 debits
 credits
 balance

 And then I can snapshot on closing of books the accumulated debits,
 credits, and balance to date.  If I need any of these numbers I can just
 grab the appropriate number from account_checkpoint and roll forward from
 end_date.  If I have too much volume I can have closings on a monthly level
 of whatever.

 The checkpoint table contains sufficient information for me to start a
 report at any point and end it at any other point without having to scan
 interceding checkpointed periods.  Additionally if I want to purge old
 data, I can do so without losing current balance information.

 So what this approach does, in essence is it gives you a way to purge
 without losing some aggregated information, and a way to skip portions of
 the table for aggregation purposes you can't skip otherwise.  The big thing
 is you cannot insert (and if this is in doubt, you need to enforce this
 with a trigger) any records effective before the most recent checkpoint.

 Best Wishes,
 Chris Travers



Re: [GENERAL] Strategies/Best Practises Handling Large Tables

2012-10-21 Thread Chitra Creta
Thank you all.

Ryan, would you mind sharing your one-time function to move it?

Merlin, what are your suggestions to improve query performance?

Shaun, thank you. I will look into facts and dimensions should all else
fail.

Chris, would you mind giving me an example of what you mean by your log,
aggregate and snapshot approach. Also, with indexing, I believe composite
and partial indexes are better than indexes, am I correct? Do you have any
recommendations as to which type (e.g btree, hash) is better for which
situations.

Thank you,
Chitra



On Thu, Oct 18, 2012 at 12:47 AM, Chris Travers chris.trav...@gmail.comwrote:



 On Fri, Oct 12, 2012 at 7:44 AM, Chitra Creta chitracr...@gmail.comwrote:

 Hi,

 I currently have a table that is growing very quickly - i.e 7 million
 records in 5 days. This table acts as a placeholder for statistics, and
 hence the records are merely inserted and never updated or deleted.

 Many queries are run on this table to obtain trend analysis. However,
 these queries are now starting to take a very long time (hours) to execute
 due to the size of the table.

 I have put indexes on this table, to no significant benefit.  Some of the
 other strategies I have thought of:
 1. Purge old data
 2. Reindex
 3. Partition
 4. Creation of daily, monthly, yearly summary tables that contains
 aggregated data specific to the statistics required

 Does anyone know what is the best practice to handle this situation?


 The answer is well, it depends.  Possibly some combination.

 One approach I like that may be included in #4 but not necessarily is the
 idea of summary tables which contain snapshots of the data, allowing you to
 roll forward or backward from defined points.  This is what I call the log,
 aggregate, and snapshot approach.   But it really depends on what you are
 doing and there is no one size fits all approach at this volume.

 Instead of reindexing, I would suggest also looking into partial indexes.

 Best Wishes,



Re: [GENERAL] Strategies/Best Practises Handling Large Tables

2012-10-16 Thread Chitra Creta
Thank you all for your suggestions. Since all of you recommended the
Partition option, I decided to try it out myself.

I found a function that created partition tables for every month in the
last two years. A trigger was also added to the parent table to ensure that
every insert into it from hence forth will be inserted into the
approapriate child table.

However, there were a few observations that I made which I would appreciate
your comments on:

1. Since existing data was in the parent table, I had to do a pg_dump on
it, drop it, and then to a restore on it to force the trigger to work on
existing data. Is this how partitioning existing data should be done?

2. I noticed that there are two copies of the same record - i.e the one
that was inserted into the parent table and another that was inserted in
the child table. If I delete the record in the parent table, the child
record gets automatically deleted. I was under the impression that
partitioning meant that my parent table will not be large anymore because
the data will be moved to smaller child tables. Is this the case?

3. Is there a way for me to evaluate the effectiveness of the partitioned
table? Would performing an Explain Analyse allow me to determine whether
querying the parent table for statistics is quicker than querying against a
massive non-partitioned table?

Thank you.
 On Oct 13, 2012 3:49 AM, John R Pierce pie...@hogranch.com wrote:

 On 10/12/12 7:44 AM, Chitra Creta wrote:


 1. Purge old data
 2. Reindex
 3. Partition
 4. Creation of daily, monthly, yearly summary tables that contains
 aggregated data specific to the statistics required


 if most of your queries read the majority of the tables,  indexing will be
 of little help

 parittioning will aid in purging old data, as you can partitions by date
 (for instance, by week) and drop whole partitions rather than deleting
 individual records.

 aggregate tables likely will be the biggest win for your statistics if
 they reduce the mount of data you need to query.


 --
 john r pierceN 37, W 122
 santa cruz ca mid-left coast



 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/**mailpref/pgsql-generalhttp://www.postgresql.org/mailpref/pgsql-general



[GENERAL] Strategies/Best Practises Handling Large Tables

2012-10-12 Thread Chitra Creta
Hi,

I currently have a table that is growing very quickly - i.e 7 million
records in 5 days. This table acts as a placeholder for statistics, and
hence the records are merely inserted and never updated or deleted.

Many queries are run on this table to obtain trend analysis. However, these
queries are now starting to take a very long time (hours) to execute due to
the size of the table.

I have put indexes on this table, to no significant benefit.  Some of the
other strategies I have thought of:
1. Purge old data
2. Reindex
3. Partition
4. Creation of daily, monthly, yearly summary tables that contains
aggregated data specific to the statistics required

Does anyone know what is the best practice to handle this situation?

I would appreciate knowledge sharing on the pros and cons of the above, or
if there are any other strategies that I could put in place.

Thanking you in advance.


Re: [GENERAL] PostgreSQL 8.3 data corruption

2012-05-02 Thread Chitra Creta
Hello all,

Any thoughts on this one?

Cheers.


On Tue, May 1, 2012 at 9:51 PM, Chitra Creta chitracr...@gmail.com wrote:

 Hi there,

 I have mission-critical data running on PostgreSQL 8.3. My database got
 corrupted a few days ago as I ran out of disk space.

 I had to run pg_resetxlog to get the database started again. I am now
 experiencing the following errors:

 1. ERROR: t_xmin is uncommitted in tuple to be updated

 2. ERROR: could not access status of transaction 61323662
   Detail: Could not read from file pg_subtrans/03A7 at offset 188416: No
 error.

 Does anyone know how to resolve the issues above? Would a VACUUM FULL or a
 dump and restore work? Or is there another way to fix the above.


 Thanking you in advance,
 Chitra



[GENERAL] PostgreSQL 8.3 data corruption

2012-05-01 Thread Chitra Creta
Hi there,

I have mission-critical data running on PostgreSQL 8.3. My database got
corrupted a few days ago as I ran out of disk space.

I had to run pg_resetxlog to get the database started again. I am now
experiencing the following errors:

1. ERROR: t_xmin is uncommitted in tuple to be updated

2. ERROR: could not access status of transaction 61323662
  Detail: Could not read from file pg_subtrans/03A7 at offset 188416: No
error.

Does anyone know how to resolve the issues above? Would a VACUUM FULL or a
dump and restore work? Or is there another way to fix the above.


Thanking you in advance,
Chitra


[GENERAL] Upgrading an existing database structure

2011-10-24 Thread Chitra Creta
Hello all,



I am interested in knowing how to upgrade a database structure on an
existing database.



I have a script that creates and alters tables, columns and data. When this
script is run, errors are thrown for already inserted tables, columns and
data. Anything that is not found in the existing database is then executed.
This prevents existing data to not be wiped out. Does anyone else have a
better way of upgrading a database structure on an existing database?



Thanking you in advance.