Re: [HACKERS] GIN index creation extremely slow ?

2006-07-11 Thread Teodor Sigaev

Try again, today's patch solves the problem.

Stefan Kaltenbrunner wrote:

on IRC somebody mentioned that it took 34h to greate a GIN index (on a
tsvector) on a ~3 Million column table (wikipedia dump) with a
reasonable speced box (AMD 3400+).
After getting hold of a dump of said table (around 4,1GB in size) I
managed to get the following timings:

test=# CREATE INDEX idxFTI_idx ON wikipedia USING gist(vector);
CREATE INDEX
Time: 416122.896 ms

so about 7 minutes - sounds very reasonable

test=# CREATE INDEX idxFTI2_idx ON wikipedia USING gin(vector);
CREATE INDEX
Time: 52681605.101 ms

ouch - that makes for a whoppy 14,6hours(!). During that time the box is
completely CPU bottlenecked and during virtually no IO at all - (varing
maintainance_work_mem does not seem to make any noticable difference).

That box is a fast Dual Opteron 2.6Ghz with 8GB RAM and a 4 disk RAID10
for the WAL and 12 disks for the data running a very recent -HEAD
checkout ...

It looks like we still don't have any docs for GIN in the tree so I
don't know if those timings are expected or not ...


Stefan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] GIN index creation extremely slow ?

2006-06-30 Thread Stefan Kaltenbrunner
Teodor Sigaev wrote:
 Tom did commit a patch a while ago which made a huge difference in
 index creation time for tsearch by changing one routine. I don't know
 if it got backpatched, so it might be worth checking people are working
 on the same version.
 
 I saw that patch, but I still think that 7 minutes is too small :)

hmm I did some further testing on that and it looks like you might
indeed be right in suspecting that there is something fishy with the
GIST results.
It might be possible that there was some issue with the generated
tsvectors (all of them empty due to hitting the too long value error
case for exceeding MAXSTRPOS) in the GIST case - sorry for the confusion
 :-(

Speaking of the too long value error message - some of the
errormessages in tsvector.c are a bit terse (ie it is not really
obvious what is causing the above error without looking at the source
for example).


Stefan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] GIN index creation extremely slow ?

2006-06-28 Thread Teodor Sigaev

I won't have access to the original testcase and server for a few days
but I just redid some testing on a slower personal box of mine with a
smaller(but similiar) testset and on that box I could not reproduce that
issue.
So the problem is either caused by the size of the table or somehow by
the data itself :-(


We tested gin with 5 millions records (but not a wikipedia's text, but with blog 
records which is usually mush shorter) and index creation time was near 8 hours 
on Dual Xeon 3.2, but we didn't try GiST. In other hand, creating GiST index on 
0.5 millions emails took about 12 hours.


That's why 7 minutes is very suspicious result.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] GIN index creation extremely slow ?

2006-06-28 Thread Martijn van Oosterhout
On Wed, Jun 28, 2006 at 01:09:23PM +0400, Teodor Sigaev wrote:
 We tested gin with 5 millions records (but not a wikipedia's text, but with 
 blog records which is usually mush shorter) and index creation time was 
 near 8 hours on Dual Xeon 3.2, but we didn't try GiST. In other hand, 
 creating GiST index on 0.5 millions emails took about 12 hours.
 
 That's why 7 minutes is very suspicious result.

Tom did commit a patch a while ago which made a huge difference in
index creation time for tsearch by changing one routine. I don't know
if it got backpatched, so it might be worth checking people are working
on the same version.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] GIN index creation extremely slow ?

2006-06-28 Thread Oleg Bartunov

On Wed, 28 Jun 2006, Martijn van Oosterhout wrote:


On Wed, Jun 28, 2006 at 01:09:23PM +0400, Teodor Sigaev wrote:

We tested gin with 5 millions records (but not a wikipedia's text, but with
blog records which is usually mush shorter) and index creation time was
near 8 hours on Dual Xeon 3.2, but we didn't try GiST. In other hand,
creating GiST index on 0.5 millions emails took about 12 hours.

That's why 7 minutes is very suspicious result.


Tom did commit a patch a while ago which made a huge difference in
index creation time for tsearch by changing one routine. I don't know
if it got backpatched, so it might be worth checking people are working
on the same version.


it's on 8.1.4 and speedup is about 10 times.



Have a nice day,



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] GIN index creation extremely slow ?

2006-06-28 Thread Teodor Sigaev

Tom did commit a patch a while ago which made a huge difference in
index creation time for tsearch by changing one routine. I don't know
if it got backpatched, so it might be worth checking people are working
on the same version.


I saw that patch, but I still think that 7 minutes is too small :)

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] GIN index creation extremely slow ?

2006-06-27 Thread Teodor Sigaev

test=# CREATE INDEX idxFTI_idx ON wikipedia USING gist(vector);
CREATE INDEX
Time: 416122.896 ms

so about 7 minutes - sounds very reasonable

test=# CREATE INDEX idxFTI2_idx ON wikipedia USING gin(vector);
CREATE INDEX
Time: 52681605.101 ms


I'll look at this,  but GiST time creation is suspiciously small.
Can you test on smaller table, for example with 10 records and if results 
are repeat, pls, send to me test suite...


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] GIN index creation extremely slow ?

2006-06-27 Thread Stefan Kaltenbrunner
Teodor Sigaev wrote:
 test=# CREATE INDEX idxFTI_idx ON wikipedia USING gist(vector);
 CREATE INDEX
 Time: 416122.896 ms

 so about 7 minutes - sounds very reasonable

 test=# CREATE INDEX idxFTI2_idx ON wikipedia USING gin(vector);
 CREATE INDEX
 Time: 52681605.101 ms
 
 I'll look at this,  but GiST time creation is suspiciously small.
 Can you test on smaller table, for example with 10 records and if
 results are repeat, pls, send to me test suite...

I won't have access to the original testcase and server for a few days
but I just redid some testing on a slower personal box of mine with a
smaller(but similiar) testset and on that box I could not reproduce that
issue.
So the problem is either caused by the size of the table or somehow by
the data itself :-(


Stefan

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] GIN index creation extremely slow ?

2006-06-26 Thread Stefan Kaltenbrunner
on IRC somebody mentioned that it took 34h to greate a GIN index (on a
tsvector) on a ~3 Million column table (wikipedia dump) with a
reasonable speced box (AMD 3400+).
After getting hold of a dump of said table (around 4,1GB in size) I
managed to get the following timings:

test=# CREATE INDEX idxFTI_idx ON wikipedia USING gist(vector);
CREATE INDEX
Time: 416122.896 ms

so about 7 minutes - sounds very reasonable

test=# CREATE INDEX idxFTI2_idx ON wikipedia USING gin(vector);
CREATE INDEX
Time: 52681605.101 ms

ouch - that makes for a whoppy 14,6hours(!). During that time the box is
completely CPU bottlenecked and during virtually no IO at all - (varing
maintainance_work_mem does not seem to make any noticable difference).

That box is a fast Dual Opteron 2.6Ghz with 8GB RAM and a 4 disk RAID10
for the WAL and 12 disks for the data running a very recent -HEAD
checkout ...

It looks like we still don't have any docs for GIN in the tree so I
don't know if those timings are expected or not ...


Stefan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] GIN index creation extremely slow ?

2006-06-26 Thread Oleg Bartunov

gin uses maintenance_work_mem,so try to increase it and see dramatic
improvements

Oleg
On Mon, 26 Jun 2006, Stefan Kaltenbrunner wrote:


on IRC somebody mentioned that it took 34h to greate a GIN index (on a
tsvector) on a ~3 Million column table (wikipedia dump) with a
reasonable speced box (AMD 3400+).
After getting hold of a dump of said table (around 4,1GB in size) I
managed to get the following timings:

test=# CREATE INDEX idxFTI_idx ON wikipedia USING gist(vector);
CREATE INDEX
Time: 416122.896 ms

so about 7 minutes - sounds very reasonable

test=# CREATE INDEX idxFTI2_idx ON wikipedia USING gin(vector);
CREATE INDEX
Time: 52681605.101 ms

ouch - that makes for a whoppy 14,6hours(!). During that time the box is
completely CPU bottlenecked and during virtually no IO at all - (varing
maintainance_work_mem does not seem to make any noticable difference).

That box is a fast Dual Opteron 2.6Ghz with 8GB RAM and a 4 disk RAID10
for the WAL and 12 disks for the data running a very recent -HEAD
checkout ...

It looks like we still don't have any docs for GIN in the tree so I
don't know if those timings are expected or not ...


Stefan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] GIN index creation extremely slow ?

2006-06-26 Thread Christopher Kings-Lynne

It looks like we still don't have any docs for GIN in the tree so I
don't know if those timings are expected or not ...


Ummm my bad.  Sorry...


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings