Hi. At 2014-04-20 01:06:43 +0200, alhash...@alhashash.net wrote: > > To use unaccent dictionary for these languages, we need to allow empty > targets to remove diacritics instead of replacing them.
Your patch should definitely add a test case or two to sql/unaccent.sql and expected/unaccent.out showing the behaviour that didn't work before the change. > The attached patch modfies unaacent.c so that dictionary parser uses > zero-length target when the line has no target. The basic idea seems sensible. > diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c > old mode 100644 > new mode 100755 > index a337df6..4e72829 > --- a/contrib/unaccent/unaccent.c > +++ b/contrib/unaccent/unaccent.c > @@ -58,7 +58,9 @@ placeChar(TrieChar *node, unsigned char *str, int lenstr, > char *replaceTo, int r > { > curnode->replacelen = replacelen; > curnode->replaceTo = palloc(replacelen); > - memcpy(curnode->replaceTo, replaceTo, replacelen); > + /* palloc(0) returns a valid address, not NULL */ > + if (replaceTo) /* memcpy() is undefined for NULL > pointers*/ > + memcpy(curnode->replaceTo, replaceTo, > replacelen); > } > } I think these comments confuse the issue, and should be removed. In fact, I think this part of the code can remain unchanged (see below). > @@ -105,10 +107,10 @@ initTrie(char *filename) > while ((line = tsearch_readline(&trst)) != NULL) > { > /* > - * The format of each line must be "src trg" > where src and trg > + * The format of each line must be "src [trg]" > where src and trg > * are sequences of one or more non-whitespace > characters, > * separated by whitespace. Whitespace at > start or end of > - * line is ignored. > + * line is ignored. If no trg added, a > zero-length string is used. > */ > int state; I suggest "If trg is empty, a zero-length string is used" for the last sentence. > @@ -160,6 +162,13 @@ initTrie(char *filename) > } > } > > + /* if no trg (loop stops at state 1 or 2), use > zero-length target */ > + if (state == 1 || state == 2) > + { > + trglen = 0; > + state = 5; > + } If I understand the code correctly, "src" alone will leave state == 1, and "src " will leave state == 2, and in both cases trg and trglen will be unchanged (from NULL and 0 respectively). In that case, I think it would be clearer to do something like this: char *trg = ""; … /* It's OK to have a valid src and empty trg. */ if (state > 0 && trglen == 0) state = 5; That way, you don't have the NULL pointer, and you don't have to add a NULL-pointer test in placeChar() above. What do you think? If you submit a revised patch along these lines, I'll mark it ready for committer. Thank you. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers