Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-07 Thread Christopher Kings-Lynne
No, my problem is that using TSearch2 interferes with other core 
components of postgres like (auto)vacuum or dump/restore.


That's nonsense...seriously.

The only trick with dump/restore is that you have to install the 
tsearch2 shared library before restoring.  That's the same as all 
contribs though.


Chris


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-07 Thread Michael Riess

Christopher Kings-Lynne schrieb:
No, my problem is that using TSearch2 interferes with other core 
components of postgres like (auto)vacuum or dump/restore.


That's nonsense...seriously.

The only trick with dump/restore is that you have to install the 
tsearch2 shared library before restoring.  That's the same as all 
contribs though.


Well, then it changed since I last read the documentation. That was 
about a year ago, and since then we are using Lucene ... and as it works 
quite nicely, I see no reason to switch to TSearch2. Including it with 
the pgsql core would make it much more attractive to me, as it seems to 
me that once included into the core, features seem to be more stable. 
Call me paranoid, if you must ... ;-)





Chris


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Joshua Kramer


Greetings all,

I'm going to do a performance comparison with DocMgr and PG81/TSearch2 on 
one end, and Apache Lucene on the other end.


In order to do this, I'm going to create a derivative of the 
docmgr-autoimport script so that I can specify one file to import at a 
time.  I'll then create a Perl script which logs all details (such as 
timing, etc.) as the test progresses.


As test data, I have approximately 9,000 text files from Project Gutenberg 
ranging in size from a few hundred bytes to 4.5M.


I plan to test the speed of import of each file.  Then, I plan to write a 
web-robot in Perl that will test the speed and number of results returned.


Can anyone think of a validation of this test, or how I should configure 
PG to maximise import and search speed?  Can I maximise search speed and 
import speed, or are those things mutually exclusive?  (Note that this 
will be run on limited hardware - 900MHz Athlon with 512M of ram)


Has anyone ever compared TSearch2 to Lucene, as far as performance is 
concerned?


Thanks,
-Josh

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Michael Riess


Has anyone ever compared TSearch2 to Lucene, as far as performance is 
concerned?


I'll stay away from TSearch2 until it is fully integrated in the 
postgres core (like create index foo_text on foo (texta, textb) USING 
TSearch2). Because a full integration is unlikely to happen in the near 
future (as far as I know), I'll stick to Lucene.


Mike

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Oleg Bartunov

Folks,

tsearch2 and Lucene are very different search engines, so it'd be unfair
comparison. If you need full access to metadata and instant indexing
you, probably, find tsearch2 is more suitable then Lucene. But, if 
you could live without that features and need to search read only

archives you need Lucene.

Tsearch2 integration into pgsql would be cool, but, I see no problem to 
use tsearch2 as an official extension module. After completing our

todo, which we hope will likely  happens for 8.2 release, you could
forget about Lucene and other engines :) We'll be available for developing
in spring and we estimate about three months for our todo, so, it's
really doable.

Oleg

On Tue, 6 Dec 2005, Michael Riess wrote:



Has anyone ever compared TSearch2 to Lucene, as far as performance is 
concerned?


I'll stay away from TSearch2 until it is fully integrated in the postgres 
core (like create index foo_text on foo (texta, textb) USING TSearch2). 
Because a full integration is unlikely to happen in the near future (as far 
as I know), I'll stick to Lucene.


Mike

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

 http://www.postgresql.org/docs/faq



Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Bruce Momjian
Oleg Bartunov wrote:
 Folks,
 
 tsearch2 and Lucene are very different search engines, so it'd be unfair
 comparison. If you need full access to metadata and instant indexing
 you, probably, find tsearch2 is more suitable then Lucene. But, if 
 you could live without that features and need to search read only
 archives you need Lucene.
 
 Tsearch2 integration into pgsql would be cool, but, I see no problem to 
 use tsearch2 as an official extension module. After completing our
 todo, which we hope will likely  happens for 8.2 release, you could
 forget about Lucene and other engines :) We'll be available for developing
 in spring and we estimate about three months for our todo, so, it's
 really doable.

Agreed.  There isn't anything magical about a plug-in vs something
integrated, as least in PostgreSQL.  In other database, plug-ins can't
fully function as integrated, but in PostgreSQL, everything is really a
plug-in because it is all abstracted.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes:
 Oleg Bartunov wrote:
 Tsearch2 integration into pgsql would be cool, but, I see no problem to 
 use tsearch2 as an official extension module.

 Agreed.  There isn't anything magical about a plug-in vs something
 integrated, as least in PostgreSQL.

The quality gap between contrib and the main system is a lot smaller
than it used to be, at least for those contrib modules that have
regression tests.  Main and contrib get equal levels of testing from
the buildfarm, so they're about on par as far as portability goes.
We could never say that before 8.1 ...

(Having said that, I think that tsearch2 will eventually become part
of core, but probably not for awhile yet.)

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Michael Riess

Bruce Momjian schrieb:

Oleg Bartunov wrote:

Folks,

tsearch2 and Lucene are very different search engines, so it'd be unfair
comparison. If you need full access to metadata and instant indexing
you, probably, find tsearch2 is more suitable then Lucene. But, if 
you could live without that features and need to search read only

archives you need Lucene.

Tsearch2 integration into pgsql would be cool, but, I see no problem to 
use tsearch2 as an official extension module. After completing our

todo, which we hope will likely  happens for 8.2 release, you could
forget about Lucene and other engines :) We'll be available for developing
in spring and we estimate about three months for our todo, so, it's
really doable.


Agreed.  There isn't anything magical about a plug-in vs something
integrated, as least in PostgreSQL.  In other database, plug-ins can't
fully function as integrated, but in PostgreSQL, everything is really a
plug-in because it is all abstracted.



I only remember evaluating TSearch2 about a year ago, and when I read 
statements like Vacuum and/or database dump/restore work differently 
when using TSearch2, sql scripts need to be executed etc. I knew that I 
would not want to go there.


But I don't doubt that it works, and that it is a sane concept.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Bruce Momjian
Michael Riess wrote:
 Bruce Momjian schrieb:
  Oleg Bartunov wrote:
  Folks,
 
  tsearch2 and Lucene are very different search engines, so it'd be unfair
  comparison. If you need full access to metadata and instant indexing
  you, probably, find tsearch2 is more suitable then Lucene. But, if 
  you could live without that features and need to search read only
  archives you need Lucene.
 
  Tsearch2 integration into pgsql would be cool, but, I see no problem to 
  use tsearch2 as an official extension module. After completing our
  todo, which we hope will likely  happens for 8.2 release, you could
  forget about Lucene and other engines :) We'll be available for developing
  in spring and we estimate about three months for our todo, so, it's
  really doable.
  
  Agreed.  There isn't anything magical about a plug-in vs something
  integrated, as least in PostgreSQL.  In other database, plug-ins can't
  fully function as integrated, but in PostgreSQL, everything is really a
  plug-in because it is all abstracted.
 
 
 I only remember evaluating TSearch2 about a year ago, and when I read 
 statements like Vacuum and/or database dump/restore work differently 
 when using TSearch2, sql scripts need to be executed etc. I knew that I 
 would not want to go there.
 
 But I don't doubt that it works, and that it is a sane concept.

Good point.  I think we had some problems at that point because the API
was improved between versions.  Even if it had been integrated, we might
have had the same problem.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Russell Garrett

On 6 Dec 2005, at 16:47, Joshua Kramer wrote:
Has anyone ever compared TSearch2 to Lucene, as far as performance  
is concerned?


In our experience (small often-updated documents) Lucene leaves  
tsearch2 in the dust. This probably has a lot to do with our usage  
pattern though. For our usage it's very beneficial to have the index  
on a separate machine to the data, however in many cases this won't  
make sense. Lucene is also a lot easier to cluster than Postgres  
(it's simply a matter of NFS-mounting the index).


Russ Garrett
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Christopher Kings-Lynne

...

So you'll avoid a non-core product and instead only use another non-core 
product...?


Chris

Michael Riess wrote:


Has anyone ever compared TSearch2 to Lucene, as far as performance is 
concerned?



I'll stay away from TSearch2 until it is fully integrated in the 
postgres core (like create index foo_text on foo (texta, textb) USING 
TSearch2). Because a full integration is unlikely to happen in the near 
future (as far as I know), I'll stick to Lucene.


Mike

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Michael Riess
No, my problem is that using TSearch2 interferes with other core 
components of postgres like (auto)vacuum or dump/restore.




...

So you'll avoid a non-core product and instead only use another non-core 
product...?


Chris

Michael Riess wrote:


Has anyone ever compared TSearch2 to Lucene, as far as performance is 
concerned?



I'll stay away from TSearch2 until it is fully integrated in the 
postgres core (like create index foo_text on foo (texta, textb) USING 
TSearch2). Because a full integration is unlikely to happen in the 
near future (as far as I know), I'll stick to Lucene.


Mike

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org