I have attached a new patch with respect to the current cvs head. This
produces headline in a document for a given query. Basically it
identifies fragments of text that contain the query and displays them.
New variant is much better, but...
HeadlineParsedText contains an array of actual words but not
information about the norms. We need an indexed position vector for each
norm so that we can quickly evaluate a number of possible fragments.
Something that tsvector provides.
Why do you need to store norms? The single purpose of norms is identifying words
from query - but it's already done by hlfinditem. It sets
HeadlineWordEntry->item to corresponding QueryOperand in tsquery.
Look, headline function is rather expensive and your patch adds a lot of extra
work - at least in memory usage. And if user calls with NumFragments=0 the that
work is unneeded.
This approach does not change any other interface and fits nicely with
the overall framework.
Yeah, it's a really big step forward. Thank you. You are very close to
committing except: Did you find a hlCover() function which produce a cover from
original HeadlineParsedText representation? Is any reason to do not use it?
The norms are converted into tsvector and a number of covers are
generated. The best covers are then chosen to be in the headline. The
covers are separated using a hardcoded coversep. Let me know if you want
to expose this as an option.
Covers that overlap with already chosen covers are excluded.
Some options like ShortWord and MinWords are not taken care of right
now. MaxWords are used as maxcoversize. Let me know if you would like to
see other options for fragment generation as well.
ShortWord, MinWords and MaxWords should store their meaning, but for each
fragment, not for the whole headline.
Let me know any more changes you would like to see.
if (num_fragments == 0)
/* call the default headline generator */
mark_hl_words(prs, query, highlight, shortword, min_words,
mark_hl_fragments(prs, query, highlight, num_fragments, max_words);
Suppose, num_fragments < 2?
Teodor Sigaev E-mail: [EMAIL PROTECTED]
WWW: http://www.sigaev.ru/
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription: