Re: Sort hash keys in C...

Tim Bunce Thu, 15 May 2008 05:21:56 -0700

I'm sorry for the delay in responding Rudy.

On Wed, May 07, 2008 at 02:34:43AM -0400, Rudy Lippan wrote:
>>>> I think it would be better to keep appending to an sv rather than try to
>>>> pre-guess the size. That would avoid the need for the malloc/realloc/free.
>>> Well it would avoid my having to do it; Perl still does....
>>>
>>>> Probably slightly more expensive, but not by much if you pre-grow the sv.
>>
> Malloc/realloc:
>
>          Rate Perl    C
> Perl  40650/s   -- -80%
> C    204082/s 402%   --
>
> svcatpv():
>          Rate Perl    C
> Perl  40486/s   -- -71%
> C    139860/s 245%   --


That's a surprisingly small gain in both cases. A testament to how
fast perl itself is I guess. I've noted some potential speedups in my
comments below.

> I just(finally...) created a mirror of the svn repository, so if you don't 
> see anything that needs to be changed, I will do a little more testing and 
> work up a diff against trunk.

Great. I've a few random comments below...

> use Scalar::Util qw(looks_like_number);
> sub _get_sorted_hash_keys {
>     my ($hash_ref, $sort_type) = @_;
>     my $sort_guess = 1;
>     $sort_guess =  1!=looks_like_number($_) ? 0:$sort_guess for keys 
> %$hash_ref;
>     $sort_type = $sort_guess unless (defined $sort_type);

You don't need to calculate $sort_guess unless $sort_type is undef.

>     my @keys = keys %$hash_ref;
>     no warnings 'numeric';
>     my @keys = ($sort_type && $sort_guess)
>         ? sort {$a <=> $b} @keys
>         : sort    @keys;

Duplicate "my @keys" declarations.

> --- DBI-1.604/DBI.xs  2008-03-24 09:44:38.000000000 -0400
> +++ DBI-1.604-concat_hash/DBI.xs      2008-05-07 01:26:59.000000000 -0400
> @@ -209,6 +209,90 @@
>      return buf;
>  }
>  
> +_cmp_number (val1, val2)
> +{
> +    dTHX;

No need for dTHX there.

> +_cmp_str (val1, val2)
> +{
> +    dTHX;

or here.

> +char **
> +_sort_hash_keys (hash, sort_order, total_length)
> +{

> +    while ((entry = hv_iternext(hash))) {
> +        *(keys+idx) = hv_iterkey(entry, &key_len);
> +        tot_len += key_len;
> +        
> +        look_num_sv = sv_2mortal(newSVpv(*(keys+idx),0));
> +        if (1 == looks_like_number(look_num_sv))
> +            *(numbers+idx) = (int)SvIV(look_num_sv);
> +        else
> +            numberish = 0;

You could avoid the expensive sv_2mortal(newSVpv()) by replacing those 5
lines with something like this:

    if (grok_number(*(keys+idx), key_len, numbers+idx) == IS_NUMBER_IN_UV)
        numberish = 0;

(The numbers array would need to be UV's, but that's no problem.)

> +    if (0 == sort_order || 0 == numberish ) {
> +        qsort(keys, hv_len, sizeof(char*), _cmp_str);
> +    } else {
> +        qsort(numbers, hv_len, sizeof(int), _cmp_number);
> +        for (idx = 0; idx < hv_len; ++idx)
> +            *(keys+idx) =SvPV(sv_2mortal(newSViv(numbers[idx])),key_len); 

It would be good to add a clear comment somewhere that the pointers in
the keys array are only valid until perl next free mortal temps.
(There are faster ways of doing that but the use case for the numeric
keys isn't speed critical.)

> +_concat_hash_sorted (hash, kv_sep_sv, pair_sep_sv, 
> value_format_sv,sort_type_sv)
> +    PREINIT:
> +    CODE:

I'd be grateful if you could refactor this so the bulk of the code is in
a C function and the XS is just a small wrapper. Would make it easier to
use for the ShowErrorStatement code (around line 3387).

Thanks again for all your work on this Rudy!

Tim.

p.s. I've already committed the sv_2mortal change in neatsvpv() - but in
a different way, so you can drop that hunk from the patch before you
apply the patch to the trunk.

Re: Sort hash keys in C...

Reply via email to