On 12/30/2015 06:49 PM, Stas Kelvich wrote:
Hi, Tomáš! Thanks for comprehensive review.
On 15 Dec 2015, at 06:07, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote:
1) It's a bit difficult to judge the usefulness of the API, as I've
always been a mere user of full-text search, and I never had a need
(or courage) to mess with the tsvectors. OTOH I don't see a good
reason no to have such API, when there's a need for it.
The API seems to be reasonably complete, with one exception - when
looking at editing function we have for 'hstore', we do have these
variants for delete()
delete(hstore,text)
delete(hstore,text[])
delete(hstore,hstore)
while this patch only adds delete(tsvector,text). Would it make
sense to add variants similar to hstore? It probably does not make
much sense to add delete(tsvector,tsvector), right? But being able
to delete a bunch of lexemes in one go seems like a good thing.
What do you think?
That’s a good idea and actually deleting tsvector from tsvector makes perfect
sense. In delete function I used exact string match between string and lexemes in
tsvector, but if somebody wants to delete for example “Cats” from tsvector, then
he should downcase and singularize this word. Easiest way to do it is to just use
to_tsvector() function. Also we can use this function to delete specific
positions: like delete('cat:3 fat:2,4'::tsvector, 'fat:2'::tsvector) -> 'cat:3
fat:4'::tsvector.
So in attached patch I’ve implemented following:
delete(tsin tsvector, lexarrtext[]) — remove any occurence of lexemes inlexarr
from tsin
OK, although I do recommend using more sensible variable names, i.e. why
how to use 'lexemes' instead of 'lexarr' for example? Similarly for the
other functions.
delete(tsin tsvector, tsv_filter tsvector) — Delete lexemes and/or positions of
tsv_filter from tsin. When lexeme in tsv_filter has no positions function will
delete any occurrence of same lexeme in tsin. When tsv_filter lexeme have
positions function will delete them from positions of matching lexeme in tsin.
If after such removal resulting positions set is empty then function will
delete that lexeme from resulting tsvector.
I can't really imagine situation in which I'd need this, but if you do
have a use case for it ... although in the initial paragraph you say
"... but if somebody wants to delete for example ..." which suggests you
may not have such use case.
Based on bad experience with extending API based on vague ideas, I
recommend only really adding functions with existing need. It's easy to
add a function later, much more difficult to remove it or change the
signature.
Also if we want some level of completeness of API and taking into account that
concat() function shift positions on second argument I thought that it can be
useful to also add function that can shift all positions of specific value.
This helps to undo concatenation: delete one of concatenating tsvectors and
then shift positions in resulting tsvector. So I also wrote one another small
function:
shift(tsin tsvector,offset int16) — Shift all positions in tsin by given offset
That seems rather too low-level. Shouldn't it be really built into
delete() directly somehow?
4) I find it rather annoying that there are pretty much no comments in
the code. Granted, there are pretty much no comments in the
surrounding code, but I doubt that's a good reason for not having
any comments in new code. It makes reviews unnecessarily difficult.
Fixed, I think.
Yep, much better now.
5) tsvector_concat() is not mentioned in docs at all
Concat mentioned in docs as an operator ||.
Ah, OK.
6) Docs don't mention names of the new parameters in function
signatures, just data types. The functions with a single parameter
probably don't need to do that, but multi-parameter ones should.
Fixed.
OK, but please let's use variable names clearly identifying the meaning.
So not 'w' but 'weight' and so on.
7) Some of the functions use intexterm that does not match the function
name. I see two such cases - to_tsvector and setweight. Is there a
reason for that?
Because sgml compiler wants unique indexterm. Both functions that
youmentioned use overloading of arguments and have non-unique name.
As Michael pointed out, that should probably be handled by using
<primary> and <secondary> tags.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers