Re: [HACKERS] GIN index creation extremely slow ?
Try again, today's patch solves the problem. Stefan Kaltenbrunner wrote: on IRC somebody mentioned that it took 34h to greate a GIN index (on a tsvector) on a ~3 Million column table (wikipedia dump) with a reasonable speced box (AMD 3400+). After getting hold of a dump of said table (around 4,1GB in size) I managed to get the following timings: test=# CREATE INDEX idxFTI_idx ON wikipedia USING gist(vector); CREATE INDEX Time: 416122.896 ms so about 7 minutes - sounds very reasonable test=# CREATE INDEX idxFTI2_idx ON wikipedia USING gin(vector); CREATE INDEX Time: 52681605.101 ms ouch - that makes for a whoppy 14,6hours(!). During that time the box is completely CPU bottlenecked and during virtually no IO at all - (varing maintainance_work_mem does not seem to make any noticable difference). That box is a fast Dual Opteron 2.6Ghz with 8GB RAM and a 4 disk RAID10 for the WAL and 12 disks for the data running a very recent -HEAD checkout ... It looks like we still don't have any docs for GIN in the tree so I don't know if those timings are expected or not ... Stefan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] GIN index creation extremely slow ?
Teodor Sigaev wrote: Tom did commit a patch a while ago which made a huge difference in index creation time for tsearch by changing one routine. I don't know if it got backpatched, so it might be worth checking people are working on the same version. I saw that patch, but I still think that 7 minutes is too small :) hmm I did some further testing on that and it looks like you might indeed be right in suspecting that there is something fishy with the GIST results. It might be possible that there was some issue with the generated tsvectors (all of them empty due to hitting the too long value error case for exceeding MAXSTRPOS) in the GIST case - sorry for the confusion :-( Speaking of the too long value error message - some of the errormessages in tsvector.c are a bit terse (ie it is not really obvious what is causing the above error without looking at the source for example). Stefan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] GIN index creation extremely slow ?
I won't have access to the original testcase and server for a few days but I just redid some testing on a slower personal box of mine with a smaller(but similiar) testset and on that box I could not reproduce that issue. So the problem is either caused by the size of the table or somehow by the data itself :-( We tested gin with 5 millions records (but not a wikipedia's text, but with blog records which is usually mush shorter) and index creation time was near 8 hours on Dual Xeon 3.2, but we didn't try GiST. In other hand, creating GiST index on 0.5 millions emails took about 12 hours. That's why 7 minutes is very suspicious result. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] GIN index creation extremely slow ?
On Wed, Jun 28, 2006 at 01:09:23PM +0400, Teodor Sigaev wrote: We tested gin with 5 millions records (but not a wikipedia's text, but with blog records which is usually mush shorter) and index creation time was near 8 hours on Dual Xeon 3.2, but we didn't try GiST. In other hand, creating GiST index on 0.5 millions emails took about 12 hours. That's why 7 minutes is very suspicious result. Tom did commit a patch a while ago which made a huge difference in index creation time for tsearch by changing one routine. I don't know if it got backpatched, so it might be worth checking people are working on the same version. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] GIN index creation extremely slow ?
On Wed, 28 Jun 2006, Martijn van Oosterhout wrote: On Wed, Jun 28, 2006 at 01:09:23PM +0400, Teodor Sigaev wrote: We tested gin with 5 millions records (but not a wikipedia's text, but with blog records which is usually mush shorter) and index creation time was near 8 hours on Dual Xeon 3.2, but we didn't try GiST. In other hand, creating GiST index on 0.5 millions emails took about 12 hours. That's why 7 minutes is very suspicious result. Tom did commit a patch a while ago which made a huge difference in index creation time for tsearch by changing one routine. I don't know if it got backpatched, so it might be worth checking people are working on the same version. it's on 8.1.4 and speedup is about 10 times. Have a nice day, Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] GIN index creation extremely slow ?
Tom did commit a patch a while ago which made a huge difference in index creation time for tsearch by changing one routine. I don't know if it got backpatched, so it might be worth checking people are working on the same version. I saw that patch, but I still think that 7 minutes is too small :) -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] GIN index creation extremely slow ?
test=# CREATE INDEX idxFTI_idx ON wikipedia USING gist(vector); CREATE INDEX Time: 416122.896 ms so about 7 minutes - sounds very reasonable test=# CREATE INDEX idxFTI2_idx ON wikipedia USING gin(vector); CREATE INDEX Time: 52681605.101 ms I'll look at this, but GiST time creation is suspiciously small. Can you test on smaller table, for example with 10 records and if results are repeat, pls, send to me test suite... -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] GIN index creation extremely slow ?
Teodor Sigaev wrote: test=# CREATE INDEX idxFTI_idx ON wikipedia USING gist(vector); CREATE INDEX Time: 416122.896 ms so about 7 minutes - sounds very reasonable test=# CREATE INDEX idxFTI2_idx ON wikipedia USING gin(vector); CREATE INDEX Time: 52681605.101 ms I'll look at this, but GiST time creation is suspiciously small. Can you test on smaller table, for example with 10 records and if results are repeat, pls, send to me test suite... I won't have access to the original testcase and server for a few days but I just redid some testing on a slower personal box of mine with a smaller(but similiar) testset and on that box I could not reproduce that issue. So the problem is either caused by the size of the table or somehow by the data itself :-( Stefan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] GIN index creation extremely slow ?
on IRC somebody mentioned that it took 34h to greate a GIN index (on a tsvector) on a ~3 Million column table (wikipedia dump) with a reasonable speced box (AMD 3400+). After getting hold of a dump of said table (around 4,1GB in size) I managed to get the following timings: test=# CREATE INDEX idxFTI_idx ON wikipedia USING gist(vector); CREATE INDEX Time: 416122.896 ms so about 7 minutes - sounds very reasonable test=# CREATE INDEX idxFTI2_idx ON wikipedia USING gin(vector); CREATE INDEX Time: 52681605.101 ms ouch - that makes for a whoppy 14,6hours(!). During that time the box is completely CPU bottlenecked and during virtually no IO at all - (varing maintainance_work_mem does not seem to make any noticable difference). That box is a fast Dual Opteron 2.6Ghz with 8GB RAM and a 4 disk RAID10 for the WAL and 12 disks for the data running a very recent -HEAD checkout ... It looks like we still don't have any docs for GIN in the tree so I don't know if those timings are expected or not ... Stefan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] GIN index creation extremely slow ?
gin uses maintenance_work_mem,so try to increase it and see dramatic improvements Oleg On Mon, 26 Jun 2006, Stefan Kaltenbrunner wrote: on IRC somebody mentioned that it took 34h to greate a GIN index (on a tsvector) on a ~3 Million column table (wikipedia dump) with a reasonable speced box (AMD 3400+). After getting hold of a dump of said table (around 4,1GB in size) I managed to get the following timings: test=# CREATE INDEX idxFTI_idx ON wikipedia USING gist(vector); CREATE INDEX Time: 416122.896 ms so about 7 minutes - sounds very reasonable test=# CREATE INDEX idxFTI2_idx ON wikipedia USING gin(vector); CREATE INDEX Time: 52681605.101 ms ouch - that makes for a whoppy 14,6hours(!). During that time the box is completely CPU bottlenecked and during virtually no IO at all - (varing maintainance_work_mem does not seem to make any noticable difference). That box is a fast Dual Opteron 2.6Ghz with 8GB RAM and a 4 disk RAID10 for the WAL and 12 disks for the data running a very recent -HEAD checkout ... It looks like we still don't have any docs for GIN in the tree so I don't know if those timings are expected or not ... Stefan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] GIN index creation extremely slow ?
It looks like we still don't have any docs for GIN in the tree so I don't know if those timings are expected or not ... Ummm my bad. Sorry... ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings