On Tue, Jan 26, 2010 at 6:19 AM, Andres Freund <and...@anarazel.de> wrote:
> On Tuesday 26 January 2010 01:39:48 nair rajiv wrote: > > On Tue, Jan 26, 2010 at 1:01 AM, Craig James > <craig_ja...@emolecules.com>wrote: > > I am working on a project that will take out structured content > > from wikipedia > > and put it in our database. Before putting the data into the database I > > wrote a script to > > find out the number of rows every table would be having after the data is > > in and I found > > there is a table which will approximately have 50,000,000 rows after data > > harvesting. > > Is it advisable to keep so much data in one table ? > Depends on your access patterns. I.e. how many rows are you accessing at > the > same time - do those have some common locality and such. > I'll give a brief idea of how this table is. The important columns are subject, predicate and object. So given a predicate and object one should be able to get all the subjects, given subject and a predicate one should be able to retrieve all the objects. I have created an indexes on these three columns. > > > > I have read about 'partitioning' a table. An other idea I have > is > > to break the table into > > different tables after the no of rows in a table has reached a certain > > limit say 10,00,000. > > For example, dividing a table 'datatable' to 'datatable_a', 'datatable_b' > > each having 10,00,000 rows. > > I needed advice on whether I should go for partitioning or the approach I > > have thought of. > Your approach is pretty close to partitioning - except that partitioning > makes > that mostly invisible to the outside so it is imho preferrable. > > > We have a HP server with 32GB ram,16 processors. The storage > has > > 24TB diskspace (1TB/HD). > > We have put them on RAID-5. It will be great if we could know the > > parameters that can be changed in the > > postgres configuration file so that the database makes maximum > utilization > > of the server we have. > > For eg parameters that would increase the speed of inserts and selects. > Not using RAID-5 possibly would be a good start - many people (me included) > experienced bad write performance on it. It depends a great deal on the > controller/implementation though. > RAID-10 is normally to be considered more advantageous despite its lower > usable space ratio. > Did you create one big RAID-5 out of all disks? Thats not a good idea, > because > its pretty likely that another disk fails while you restore a previously > failed disk. Unfortunately in that configuration that means you have lost > your > complete data (in the most common implementations at least). > No, I am using only 12TB i.e 12 HDs of the 24TB I have > > Andres > > PS: Your lines are strangely wrapped... >