Re: [HACKERS] multibyte charater set in levenshtein function

Alexander Korotkov Wed, 21 Jul 2010 07:44:18 -0700

On Wed, Jul 21, 2010 at 5:54 AM, Robert Haas <[email protected]> wrote:


> This patch still needs some work.  It includes a bunch of stylistic
> changes that aren't relevant to the purpose of the patch.  There's no
> reason that I can see to change the existing levenshtein_internal
> function to take text arguments instead of char *, or to change !m to
> m == 0 in existing code, or to change the whitespace in the comments
> of that function.  All of those need to be reverted before we can
> consider committing this.
>
I changed arguments of function from char * to text * in order to avoid
text_to_cstring call. Same benefit can be achived by replacing char * with
char * and length.
I changed !m to m == 0 because Itagaki asked me to make it conforming coding
style. Do you think there is no reason to fix coding style in existing
code?

> There is a huge amount of duplicated code here.  I think it makes
> sense to have the multibyte version of the function be separate, but
> perhaps we can merge the less-than-or-equal to bits  into the main
> code, so that we only have two copies instead of four.  Perhaps we
> can't just add a max_d argument max_distance to levenshtein_internal;
> and if this value is >=0 then it represents the max allowable
> distance, but if it is <0 then there is no limit.  Sure, that might
> slow down the existing code a bit, but it might not be significant.
> I'd at least like to see some numbers showing that it is significant
> before we go to this much trouble.
>
In these case we should add many checks of max_d in levenshtein_internal
function which make code more complex.
Actually, we can merge all four functions into one function. But such
function will have many checks about multibyte encoding and max_d. So, I see
four cases here:
1) one function with checks for multibyte encoding and max_d
2) two functions with checks for multibyte encoding
3) two functions with checks for max_d
4) four separate functions
If you prefer case number 3 you should argue your position little more.

> The code doesn't follow the project coding style.  Braces must be
> uncuddled.  Comment paragraphs will be reflowed unless they begin and
> end with ------.  Function definitions should have the type
> declaration on one line and the function name at the start of the
> next.
>
> Freeing memory with pfree is likely a waste of time; is there any
> reason not to just rely on the memory context reset, as the original
> coding did?
>
Ok, I'll fix this things.


> I think we might need to remove the acknowledgments section from this
> code.  If everyone who touches this code adds their name, we're
> quickly going to have a mess.  If we're not going to remove the
> acknowledgments section, then please add my name, too, because I've
> already patched this code once...
>
In that case I think we can leave original acknowledgments section.

----
With best regards,
Alexander Korotkov.

Re: [HACKERS] multibyte charater set in levenshtein function

Reply via email to