Hi folks,

I'm not sure if this is the right place for this but thought I'd ask. I'm relateively new to postgres having only used it on 3 projects and am just delving into the setup and admin for the second time.

I decided to try tsearch2 for this project's search requirements but am having trouble attaining adequate performance. I think I've nailed it down to trouble with the headline() function in tsearch2. In short, there is a crawler that grabs HTML docs and places them in a database. The search is done using tsearch2 pretty much installed according to instructions. I have read a couple online guides suggested by this list for tuning the postgresql.conf file. I only made modest adjustments because I'm not working with top-end hardware and am still uncertain of the actual impact of the different paramenters.

I've been learning 'explain' and over the course of reading I have done enough query tweaking to discover the source of my headache seems to be headline().

On a query of 429 documents, of which the avg size of the stripped down document as stored is 21KB, and the max is 518KB (an anomaly), tsearch2 performs exceptionally well returning most queries in about 100ms.

On the other hand, following the tsearch2 guide which suggests returning that first portion as a subquery and then generating the headline() from those results, I see the query increase to 4 seconds!

This seems to be directly related to document size. If I filter out that 518KB doc along with some 100KB docs by returning "substring( stripped_text FROM 0 FOR 50000) AS stripped_text" I decrease the time to 1.4 seconds, but increase the risk of not getting a headline.

Seeing as how this problem is directly tied to document size, I'm wondering if there are any specific settings in postgresql.conf that may help, or is this just a fact of life for the headline() function? Or, does anyone know what the problem is and how to overcome it?

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Reply via email to