On March 7, 2016 3:57:16 PM GMT+01:00, David Malcolm <dmalc...@redhat.com> 
wrote:
>On Sat, 2016-03-05 at 23:46 +0100, Bernhard Reutner-Fischer wrote:
>[...]
>
>> diff --git a/gcc/fortran/misc.c b/gcc/fortran/misc.c
>> index 405bae0..72ed311 100644
>> --- a/gcc/fortran/misc.c
>> +++ b/gcc/fortran/misc.c
>[...]
>
>> @@ -274,3 +275,41 @@ get_c_kind(const char *c_kind_name,teropKind_tki
>> nds_table[])
>>  
>>    return ISOCBINDING_INVALID;
>>  }
>> +
>> +
>> +/* For a given name TYPO, determine the best candidate from
>> CANDIDATES
>> +   perusing Levenshtein distance.  Frees CANDIDATES before
>> returning.  */
>> +
>> +const char *
>> +gfc_closest_fuzzy_match (const char *typo, char **candidates)
>> +{
>> +  /* Determine closest match.  */
>> +  const char *best = NULL;
>> +  char **cand = candidates;
>> +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
>> +
>> +  while (cand && *cand)
>> +    {
>> +      edit_distance_t dist = levenshtein_distance (typo, *cand);
>> +      if (dist < best_distance)
>> +    {
>> +       best_distance = dist;
>> +       best = *cand;
>> +    }
>> +      cand++;
>> +    }
>> +  /* If more than half of the letters were misspelled, the
>> suggestion is
>> +     likely to be meaningless.  */
>> +  if (best)
>> +    {
>> +      unsigned int cutoff = MAX (strlen (typo), strlen (best)) / 2;
>> +
>> +      if (best_distance > cutoff)
>> +    {
>> +      XDELETEVEC (candidates);
>> +      return NULL;
>> +    }
>> +      XDELETEVEC (candidates);
>> +    }
>> +  return best;
>> +}
>
>FWIW, there are two overloaded variants of levenshtein_distance in
>gcc/spellcheck.h, the first of which takes a pair of strlen values;
>your patch uses the second one:
>
>extern edit_distance_t
>levenshtein_distance (const char *s, int len_s,
>                     const char *t, int len_t);
>
>extern edit_distance_t
>levenshtein_distance (const char *s, const char *t);
>
>So one minor tweak you may want to consider here is to calculate
>  strlen (typo)
>once at the top of gfc_closest_fuzzy_match, and then pass it in to the
>4-arg variant of levenshtein_distance, which would avoid recalculating
>strlen (typo) for every candidate.

I've pondered this back then but came to the conclusion to use the variant 
without len because to use the 4 argument variant I would have stored the 
candidates strlen in the vector too and was not convinced about the memory 
footprint for that would be justified. Maybe it is, but I would prefer the 
following tweak in the 4 argument variant:
If you would amend the 4 argument variant with a

  if (len_t == -1)
    len_t = strlen (t);
before the
   if (len_s == 0)
     return len_t;
   if (len_t == 0)
     return len_s;

checks then I'd certainly use the 4 arg variant :)

WDYT?
>
>I can't comment on the rest of the patch (I'm not a Fortran expert),
>though it seems sane to 
>
>Hope this is constructive

It is, thanks for your thoughts!

cheers,

Reply via email to