On Mon, Apr 28, 2008 at 03:24:10PM +0100, Rudolf Lippan wrote:
> 
> 
> On Thu, 24 Apr 2008 13:31:47 +0100, Tim Bunce <[EMAIL PROTECTED]> wrote:
> 
> Hi Tim,
> 
> > So I'd be happy to see an API like this:
> > 
> > SV *_concat_hash_sorted( HV *hv, char *kv_sep, char *pair_sep, SV
> > *value_format, SV *sort_type)
> 
> Attached is the latest draft of _concat_hash_sorted(). I cleaned up the
> types, fixed a few bugs,  and formalized the test suite.

Thanks Rudolf. (I'm sorry I didn't get to look at the earlier version.)

Quite a few of my comments are more stylistic, so feel free to pay less
attention to any you disagree with :)

> Also, while I was testing, I found what looks to be a memory leak in
> neat_svpv:
> 
>         if (SvIOK(sv))
> -            nsv = newSVpvf("%"IVdf, SvIVX(sv));
> -       else nsv = newSVpvf("%"NVgf, SvNVX(sv));
> +            nsv = sv_2mortal(newSVpvf("%"IVdf, SvIVX(sv)));
> +       else nsv = sv_2mortal(newSVpvf("%"NVgf, SvNVX(sv)));

Good catch. Thanks. I've updated the repository.

Do you have svn? You could post patches from 'svn diff'.
See http://search.cpan.org/~timb/DBI/DBI.pm#CONTRIBUTING

> If there is anything you would like changed, let me know. I was not sure
> about using strcat/strncat, so if you'd like, I will change those. 
> 
> Oh, And is there a way to attach a string to an SV w/o copying it?

Not sure what you mean by "attach a string to an SV". I'll guess you
mean append. In which case, yes, there are lots of ways:

    sv_catpv
    sv_catpvf
    sv_catpvn
    sv_catpvn_flags
    sv_vcatpvf
    sv_vcatpvfn

See perldoc perlapi.

> --- DBI-1.604/DBI.xs  2008-03-24 09:44:38.000000000 -0400
> +++ DBI-1.604-concat_hash/DBI.xs      2008-04-28 14:57:57.000000000 -0400
> @@ -209,6 +209,95 @@
>      return buf;
>  }
>  
> +static int
> +_cmp_number (val1, val2)
> +    const void *val1;
> +    const void *val2;
> +{
> +    dTHX;
> +    double first, second;
> +    char **endptr = 0;
> +    int old_err;
> +
> +    old_err = errno; /* needed ? */
> +    errno = 0;
> +    first = strtod(*(char **)val1, endptr);
> +    if (0 != errno) {
> +            croak(strerror(errno));
> +    }
> +    errno = 0;
> +    second = strtod(*(char **)val2, endptr);
> +    if (0 != errno) {
> +            croak(strerror(errno));
> +    }
> +    errno = old_err;
> +
> +    if (first == second)
> +        return 0;
> +    else if (first > second)
> +     return 1;
> +    else 
> +     return -1;
> +}

Seeing this now I wonder if it might be better to have _sort_hash_keys
create an array of UV in parallel with an array of char* and then qsort
the array of UV if all the keys were valid numbers.

That would avoid the frequent strtod() calls during sorting (and the
messing with errno) and avoid problems of accidentally doing a numeric
sort when not all keys are numeric.

Just a thought. Quite a bit more work though so feel free to ignore my
ramblings!

> +char **
> +_sort_hash_keys (hash, sort_order, total_length)
> +    HV *hash;
> +    char sort_order;
> +    STRLEN *total_length;
> +{
> +    dTHX;
> +    I32 hv_len, key_len;
> +    SV *look_num;
> +    HE *entry;
> +    char **keys;
> +    void *sort;
> +    unsigned int idx = 0;
> +    STRLEN tot_len = 0;

> +    keys = malloc(sizeof(char *)*hv_len);
> +    if (!keys)
> +        croak("Unable to allocate memory");

You can use New() to avoid the need to test and croak:

    New(0, keys, hv_len, char *);

> +    /* replace with function table */
> +    if (sort_order < 0) {

You're assuming the 'char' type is signed. It might not be.

> +        look_num = sv_2mortal(newSVpv(keys[0],0));
> +        if (looks_like_number(look_num))
> +            sort = _cmp_number;
> +        else
> +            sort = _cmp_str;

I'd set sort_order = (looks_like_number(look_num)) ? 1 : 0;
here and change the 'else if' below to an 'if'. Simpler logic.

> +    } else if (0 == sort_order) {
> +        sort = _cmp_str;
> +    } else if (1 == sort_order) {
> +        sort = _cmp_number;
> +    } else {
> +        croak("Unknown sort order %i", sort_order);
> +    }
> +    qsort(keys, hv_len, sizeof(char*), sort);
> +    return keys;
> +}


> +SV *
> +_concat_hash_sorted (hash, kv_separator, pair_separator, 
> value_format,sort_type)
> +    HV *hash
> +    SV *kv_separator
> +    SV *pair_separator
> +    SV *value_format
> +    SV *sort_type
> +
> +    PREINIT:
> +        I32 hv_len;
> +        STRLEN kv_sep_len, pair_sep_len, hv_val_len, pos=0, total_len = 0;
> +        char **keys;
> +        char *joined, *kv_sep, *pair_sep, *hv_val;
> +        unsigned int i = 0;
> +        char sort;
> +        SV **hash_svp;
> +        SV *return_sv;
> +        bool not_neat;
> +    CODE:
> +
> +        kv_sep = SvPV(kv_separator, kv_sep_len);
> +        pair_sep = SvPV(pair_separator, pair_sep_len);
> +
> +        if (SvGMAGICAL(value_format))
> +            mg_get(value_format);
> +        not_neat = SvTRUE(value_format);
> +
> +        sort = -1;
> +        if (SvOK(sort_type)) {
> +            sort = SvIV(sort_type);
> +        }

I'd use one line instead of four:

        sort = (SvOK(sort_type)) ? SvIV(sort_type) : -1;

I also tend to add _sv suffix to SV variables in non-trivial subs that
have quite a few sv and non-sv variables (same for _av, _hv etc). E.g.,:

        sort_type = (SvOK(sort_type_sv)) ? SvIV(sort_type_sv) : -1;

> +        keys = _sort_hash_keys(hash, sort, &total_len);
> +        if (!keys) {
> +            ST(0) = Nullsv;
> +            return;
> +     }

I think the no-keys case should return an empty string.

> +        hv_len = hv_iterinit(hash);
> +        /* total_len += Separators + quotes + term null */
> +        total_len += kv_sep_len*hv_len + pair_sep_len*hv_len+2*hv_len+1;
> +        joined = malloc(total_len*sizeof(char));

I think it would be better to keep appending to an sv rather than try to
pre-guess the size. That would avoid the need for the malloc/realloc/free.
Probably slightly more expensive, but not by much if you pre-grow the sv.

The code would be simpler though. No need to keep track of pos, for example.
Should boil down to something like this:

  value_string = (not_neat)
        ? (SvOK(*value_svp) ? SvPV(*value_svp) : '')
        : neatsvpv(*value_svp,0);
  sv_catpvf(sv, (not_neat) ? "%s%s'%s'%s" : "%s%s%s%s",
        keys[i], kv_sep, value_string, (i<hv_len-1) ? pair_sep : "")


> +        for (i=0; i<hv_len; ++i) {

> +            hash_svp = hv_fetch(hash, keys[i], strlen(keys[i]), 0);
> +            if (hash_svp) {

hash_svp should always be true, so I'd make this a precondition check:

               if (!hash_svp) {
                    warn(...);
                    continue;
               }

and then you'd save one level of indent for the rest of the function.
(I don't like to see an indentation level going to waste :)

Great work, great tests. Thanks Rudolf!

Tim.

Reply via email to