On Tue, Jan 26, 2010 at 9:18 AM, nair rajiv <nair...@gmail.com> wrote:
> > > On Tue, Jan 26, 2010 at 6:19 AM, Andres Freund <and...@anarazel.de> wrote: > >> On Tuesday 26 January 2010 01:39:48 nair rajiv wrote: >> > On Tue, Jan 26, 2010 at 1:01 AM, Craig James >> <craig_ja...@emolecules.com>wrote: >> > I am working on a project that will take out structured >> content >> > from wikipedia >> > and put it in our database. Before putting the data into the database I >> > wrote a script to >> > find out the number of rows every table would be having after the data >> is >> > in and I found >> > there is a table which will approximately have 50,000,000 rows after >> data >> > harvesting. >> > Is it advisable to keep so much data in one table ? >> Depends on your access patterns. I.e. how many rows are you accessing at >> the >> same time - do those have some common locality and such. >> > > I'll give a brief idea of how this table is. The important columns > are > subject, predicate and object. So given a predicate and object one should > be able to get all the subjects, given subject and a predicate one should > be able to retrieve all the objects. I have created an indexes on these > three > columns. > >> >> >> > I have read about 'partitioning' a table. An other idea I have >> is >> > to break the table into >> > different tables after the no of rows in a table has reached a certain >> > limit say 10,00,000. >> > For example, dividing a table 'datatable' to 'datatable_a', >> 'datatable_b' >> > each having 10,00,000 rows. >> > I needed advice on whether I should go for partitioning or the approach >> I >> > have thought of. >> Your approach is pretty close to partitioning - except that partitioning >> makes >> that mostly invisible to the outside so it is imho preferrable. >> >> > We have a HP server with 32GB ram,16 processors. The storage >> has >> > 24TB diskspace (1TB/HD). >> > We have put them on RAID-5. It will be great if we could know the >> > parameters that can be changed in the >> > postgres configuration file so that the database makes maximum >> utilization >> > of the server we have. >> > For eg parameters that would increase the speed of inserts and selects. >> Not using RAID-5 possibly would be a good start - many people (me >> included) >> experienced bad write performance on it. It depends a great deal on the >> controller/implementation though. >> RAID-10 is normally to be considered more advantageous despite its lower >> usable space ratio. >> Did you create one big RAID-5 out of all disks? Thats not a good idea, >> because >> its pretty likely that another disk fails while you restore a previously >> failed disk. Unfortunately in that configuration that means you have lost >> your >> complete data (in the most common implementations at least). >> > > No, I am using only 12TB i.e 12 HDs of the 24TB I have > A 15k rpm SAS drive will give you a throughput of 12MB and 120 IOPS. Now you can calculate the number of disks, specifically spindles, for getting your desired throughput and IOPs > >> Andres >> >> PS: Your lines are strangely wrapped... >> > >