Re: Sort hash keys in C...

2008-05-24 Thread Rudolf Lippan


On Mon, 19 May 2008 10:15:26 +0100, Tim Bunce <[EMAIL PROTECTED]> wrote:
> 
> On Fri, May 16, 2008 at 01:45:51AM +0100, Rudolf Lippan wrote:
>> On Thu, 15 May 2008 13:21:34 +0100, Tim Bunce <[EMAIL PROTECTED]>
> wrote:
>> > It would be good to add a clear comment somewhere that the pointers in
>> > the keys array are only valid until perl next free mortal temps.
>> > (There are faster ways of doing that but the use case for the numeric
>> > keys isn't speed critical.)
>> >
>> Suck as, if you don't mind me asking?
> 
> ("Such as", I presume :)
> 

Yes!  Good thing I did not double bounce on the 's' or it could have been
rather confusing.

> Or, more deeply, instead of sorting an array of int's you could sort an
> array of { int key_as_int, char *ptr_to_key } structs. That way you'd
> also avoid the potential risk of problems with keys like "01" or "1.0"
> that change when converted to int and back to a string.
> 

That was my first thought, but I gave up on in when I realised that I 
would have to figure a place to put the typedef 

I changed the code to use a struct (although I just threw the declaration
in ).
It makes it cleaner and gets rid of the sv_2mortal'd copy of the keys.


>> >>
 
> Whatever seems reasonable (and best for ShowErrorStatement usage).
> 
I went and had the wrapper unSVify the params and call _join_hash_sorted().
This
way there is no need to create a bunch of SVs that get thrown away.


-r Patch  level 1
Source: 2032998a-08c0-11dd-b685-1998e50afd6a:/local/dbi:863
Target: 50811bd7-b8ce-0310-adc1-d9db26280581:/dbi:11280
(http://svn.perl.org/modules/dbi)
Log:
 [EMAIL PROTECTED]:  rlippan | 2008-05-24 11:30:46 -0400
 Make local branch of dbi
 [EMAIL PROTECTED]:  rlippan | 2008-05-24 12:29:52 -0400
 before changing hash_sort.
 [EMAIL PROTECTED]:  rlippan | 2008-05-24 12:30:28 -0400
 Add test script
 [EMAIL PROTECTED]:  rlippan | 2008-05-24 12:49:09 -0400
 make join_hash_sorted not take SV* here ints/char* would do.
 [EMAIL PROTECTED]:  rlippan | 2008-05-24 14:35:23 -0400
 use struct when sorting.
 [EMAIL PROTECTED]:  rlippan | 2008-05-24 14:35:53 -0400
 remove warnings. turn off benchmark tests

=== trunk/DBI.xs
==
--- trunk/DBI.xs	(revision 11280)
+++ trunk/DBI.xs	(patch concat_hash.patch level 1)
@@ -209,6 +209,141 @@
 return buf;
 }
 
+typedef struct num_srt_info {
+char *key;
+UV numeric;
+} num_srt_info;
+
+static int
+_cmp_number (val1, val2)
+const void *val1;
+const void *val2;
+{
+int first, second;
+
+first = ((struct num_srt_info *)val1)->numeric;
+second = ((struct num_srt_info *)val2)->numeric;
+
+if (first == second)
+return 0;
+else if (first > second)
+	return 1;
+else 
+	return -1;
+}
+
+static int 
+_cmp_str (val1, val2)
+const void *val1;
+const void *val2;
+{
+return strcmp( *(char **)val1, *(char **)val2);
+}
+
+static char **
+_sort_hash_keys (hash, sort_order, total_length)
+HV *hash;
+int sort_order;
+STRLEN *total_length;
+{
+dTHX;
+I32 hv_len, key_len;
+HE *entry;
+char **keys;
+unsigned int idx = 0;
+STRLEN tot_len = 0;
+bool numberish = 1;
+struct num_srt_info *numbers;
+
+hv_len = hv_iterinit(hash);
+if (!hv_len)
+return 0;
+
+Newx(keys, hv_len, char *);
+Newx(numbers, hv_len, struct num_srt_info);
+
+while ((entry = hv_iternext(hash))) {
+*(keys+idx) = hv_iterkey(entry, &key_len);
+tot_len += key_len;
+
+if (grok_number(*(keys+idx), key_len, &(numbers+idx)->numeric) != IS_NUMBER_IN_UV)
+numberish = 0;
+
+(numbers+idx)->key = *(keys+idx);
+++idx;
+}
+
+if (0 != total_length)
+*total_length = tot_len;
+
+if (sort_order <0)
+sort_order = numberish ? 1 : 0;
+
+if (0 == sort_order || 0 == numberish ) {
+qsort(keys, hv_len, sizeof(char*), _cmp_str);
+} else {
+qsort(numbers, hv_len, sizeof(struct num_srt_info), _cmp_number);
+for (idx = 0; idx < hv_len; ++idx)
+*(keys+idx) = (numbers+idx)->key;
+/* SvPV_nolen(sv_2mortal(newSViv(numbers[idx])));  */
+}
+
+Safefree(numbers);
+return keys;
+}
+
+
+static SV *
+_join_hash_sorted (hash, kv_sep, pair_sep, not_neat, sort)
+HV *hash;
+char *kv_sep;
+char *pair_sep;
+int not_neat;
+int sort;
+{
+	dTHX;
+I32 hv_len;
+STRLEN kv_sep_len, pair_sep_len, hv_val_len, total_len = 0;
+char **keys;
+char *value_string;
+unsigned int i = 0;
+SV **hash_svp;
+SV *return_sv;
+
+kv_sep_len = strlen(kv_sep);
+pair_sep_len = strlen(pair_sep);
+
+keys = _sort_hash_keys(hash, sort, &total_len);
+if (!keys)
+return newSVpv("", 0);
+
+hv_len = hv_iterinit(hash);
+/* total_len += Separators + quotes + term null */
+total_len += kv_sep_len*hv_len + pair_s

Re: Sort hash keys in C...

2008-05-19 Thread Tim Bunce

On Fri, May 16, 2008 at 01:45:51AM +0100, Rudolf Lippan wrote:
> On Thu, 15 May 2008 13:21:34 +0100, Tim Bunce <[EMAIL PROTECTED]> wrote:
> > I'm sorry for the delay in responding Rudy.
> > 
> > On Wed, May 07, 2008 at 02:34:43AM -0400, Rudy Lippan wrote:
> 
> >> use Scalar::Util qw(looks_like_number);
> >> sub _get_sorted_hash_keys {
> >> my ($hash_ref, $sort_type) = @_;
> >> my $sort_guess = 1;
> >> $sort_guess =  1!=looks_like_number($_) ? 0:$sort_guess for keys
> > %$hash_ref;
> >> $sort_type = $sort_guess unless (defined $sort_type);
> > 
> > You don't need to calculate $sort_guess unless $sort_type is undef.
> 
> I know. The code is only used for testing against the C implementation, and
> putting it inside the #if defined($sort_type) made the code wrap ;)

:-)

I only really mentioned it because you'd included a benchmark and that
change would improve the pure-perl performance.

> >> +if (0 == sort_order || 0 == numberish ) {
> >> +qsort(keys, hv_len, sizeof(char*), _cmp_str);
> >> +} else {
> >> +qsort(numbers, hv_len, sizeof(int), _cmp_number);
> >> +for (idx = 0; idx < hv_len; ++idx)
> >> +*(keys+idx)
> > =SvPV(sv_2mortal(newSViv(numbers[idx])),key_len);
> > 
> > It would be good to add a clear comment somewhere that the pointers in
> > the keys array are only valid until perl next free mortal temps.
> > (There are faster ways of doing that but the use case for the numeric
> > keys isn't speed critical.)
> >
> Suck as, if you don't mind me asking?

("Such as", I presume :)

You could realloc the keys buffer to grow it by the size of all the key
strings plus null bytes. Then write them into that extra space.

Or, more deeply, instead of sorting an array of int's you could sort an
array of { int key_as_int, char *ptr_to_key } structs. That way you'd
also avoid the potential risk of problems with keys like "01" or "1.0"
that change when converted to int and back to a string.

> >> +_concat_hash_sorted (hash, kv_sep_sv, pair_sep_sv,
> > value_format_sv,sort_type_sv)
> >> +PREINIT:
> >> +CODE:
> > 
> > I'd be grateful if you could refactor this so the bulk of the code is in
> > a C function and the XS is just a small wrapper. Would make it easier to
> > use for the ShowErrorStatement code (around line 3387).
> 
> Sure. leave params to the C function as SV *?

Whatever seems reasonable (and best for ShowErrorStatement usage).

Thanks!

Tim.


Re: Sort hash keys in C...

2008-05-15 Thread Rudolf Lippan


On Thu, 15 May 2008 13:21:34 +0100, Tim Bunce <[EMAIL PROTECTED]> wrote:
> I'm sorry for the delay in responding Rudy.
> 
> On Wed, May 07, 2008 at 02:34:43AM -0400, Rudy Lippan wrote:

> 
>> use Scalar::Util qw(looks_like_number);
>> sub _get_sorted_hash_keys {
>> my ($hash_ref, $sort_type) = @_;
>> my $sort_guess = 1;
>> $sort_guess =  1!=looks_like_number($_) ? 0:$sort_guess for keys
> %$hash_ref;
>> $sort_type = $sort_guess unless (defined $sort_type);
> 
> You don't need to calculate $sort_guess unless $sort_type is undef.
> 

I know. The code is only used for testing against the C implementation, and
putting it inside the #if defined($sort_type) made the code wrap ;)

Changed.

>> my @keys = keys %$hash_ref;
>> no warnings 'numeric';
>> my @keys = ($sort_type && $sort_guess)
>> ? sort {$a <=> $b} @keys
>> : sort@keys;
> 
> Duplicate "my @keys" declarations.
> 
Fixed. And looking over the code, I also should go back and strictify...

> 
> No need for dTHX there.
> 

> or here.
> 

expurgated.

>> +char **
>> +_sort_hash_keys (hash, sort_order, total_length)
>> +{
> 
>> +while ((entry = hv_iternext(hash))) {
>> +*(keys+idx) = hv_iterkey(entry, &key_len);
>> +tot_len += key_len;
>> +
>> +look_num_sv = sv_2mortal(newSVpv(*(keys+idx),0));
>> +if (1 == looks_like_number(look_num_sv))
>> +*(numbers+idx) = (int)SvIV(look_num_sv);
>> +else
>> +numberish = 0;
> 
> You could avoid the expensive sv_2mortal(newSVpv()) by replacing those 5
> lines with something like this:
> 
> if (grok_number(*(keys+idx), key_len, numbers+idx) ==
IS_NUMBER_IN_UV)
> numberish = 0;
> 
> (The numbers array would need to be UV's, but that's no problem.)
> 

I like that. grok_number(): I must remember that one..


>> +if (0 == sort_order || 0 == numberish ) {
>> +qsort(keys, hv_len, sizeof(char*), _cmp_str);
>> +} else {
>> +qsort(numbers, hv_len, sizeof(int), _cmp_number);
>> +for (idx = 0; idx < hv_len; ++idx)
>> +*(keys+idx)
> =SvPV(sv_2mortal(newSViv(numbers[idx])),key_len);
> 
> It would be good to add a clear comment somewhere that the pointers in
> the keys array are only valid until perl next free mortal temps.
> (There are faster ways of doing that but the use case for the numeric
> keys isn't speed critical.)
>
Suck as, if you don't mind me asking?
 
>> +_concat_hash_sorted (hash, kv_sep_sv, pair_sep_sv,
> value_format_sv,sort_type_sv)
>> +PREINIT:
>> +CODE:
> 
> I'd be grateful if you could refactor this so the bulk of the code is in
> a C function and the XS is just a small wrapper. Would make it easier to
> use for the ShowErrorStatement code (around line 3387).

Sure. leave params to the C function as SV *?

> 
> Thanks again for all your work on this Rudy!

No problem.  It has been a few years since I have done any C/XS coding, 
and I am trying to get back in the swing. Thank you for the suggestions.
> 
> p.s. I've already committed the sv_2mortal change in neatsvpv() - but in
> a different way, so you can drop that hunk from the patch before you
> apply the patch to the trunk.

I caught that after I applied it against trunk -- it took me a few minutes
to
figure out why I was getting double frees :)


-r



Re: Sort hash keys in C...

2008-05-15 Thread Tim Bunce
I'm sorry for the delay in responding Rudy.

On Wed, May 07, 2008 at 02:34:43AM -0400, Rudy Lippan wrote:
 I think it would be better to keep appending to an sv rather than try to
 pre-guess the size. That would avoid the need for the malloc/realloc/free.
>>> Well it would avoid my having to do it; Perl still does
>>>
 Probably slightly more expensive, but not by much if you pre-grow the sv.
>>
> Malloc/realloc:
>
>  Rate PerlC
> Perl  40650/s   -- -80%
> C204082/s 402%   --
>
> svcatpv():
>  Rate PerlC
> Perl  40486/s   -- -71%
> C139860/s 245%   --

That's a surprisingly small gain in both cases. A testament to how
fast perl itself is I guess. I've noted some potential speedups in my
comments below.

> I just(finally...) created a mirror of the svn repository, so if you don't 
> see anything that needs to be changed, I will do a little more testing and 
> work up a diff against trunk.

Great. I've a few random comments below...

> use Scalar::Util qw(looks_like_number);
> sub _get_sorted_hash_keys {
> my ($hash_ref, $sort_type) = @_;
> my $sort_guess = 1;
> $sort_guess =  1!=looks_like_number($_) ? 0:$sort_guess for keys 
> %$hash_ref;
> $sort_type = $sort_guess unless (defined $sort_type);

You don't need to calculate $sort_guess unless $sort_type is undef.

> my @keys = keys %$hash_ref;
> no warnings 'numeric';
> my @keys = ($sort_type && $sort_guess)
> ? sort {$a <=> $b} @keys
> : sort@keys;

Duplicate "my @keys" declarations.

> --- DBI-1.604/DBI.xs  2008-03-24 09:44:38.0 -0400
> +++ DBI-1.604-concat_hash/DBI.xs  2008-05-07 01:26:59.0 -0400
> @@ -209,6 +209,90 @@
>  return buf;
>  }
>  
> +_cmp_number (val1, val2)
> +{
> +dTHX;

No need for dTHX there.

> +_cmp_str (val1, val2)
> +{
> +dTHX;

or here.

> +char **
> +_sort_hash_keys (hash, sort_order, total_length)
> +{

> +while ((entry = hv_iternext(hash))) {
> +*(keys+idx) = hv_iterkey(entry, &key_len);
> +tot_len += key_len;
> +
> +look_num_sv = sv_2mortal(newSVpv(*(keys+idx),0));
> +if (1 == looks_like_number(look_num_sv))
> +*(numbers+idx) = (int)SvIV(look_num_sv);
> +else
> +numberish = 0;

You could avoid the expensive sv_2mortal(newSVpv()) by replacing those 5
lines with something like this:

if (grok_number(*(keys+idx), key_len, numbers+idx) == IS_NUMBER_IN_UV)
numberish = 0;

(The numbers array would need to be UV's, but that's no problem.)

> +if (0 == sort_order || 0 == numberish ) {
> +qsort(keys, hv_len, sizeof(char*), _cmp_str);
> +} else {
> +qsort(numbers, hv_len, sizeof(int), _cmp_number);
> +for (idx = 0; idx < hv_len; ++idx)
> +*(keys+idx) =SvPV(sv_2mortal(newSViv(numbers[idx])),key_len); 

It would be good to add a clear comment somewhere that the pointers in
the keys array are only valid until perl next free mortal temps.
(There are faster ways of doing that but the use case for the numeric
keys isn't speed critical.)

> +_concat_hash_sorted (hash, kv_sep_sv, pair_sep_sv, 
> value_format_sv,sort_type_sv)
> +PREINIT:
> +CODE:

I'd be grateful if you could refactor this so the bulk of the code is in
a C function and the XS is just a small wrapper. Would make it easier to
use for the ShowErrorStatement code (around line 3387).

Thanks again for all your work on this Rudy!

Tim.

p.s. I've already committed the sv_2mortal change in neatsvpv() - but in
a different way, so you can drop that hunk from the patch before you
apply the patch to the trunk.


Re: Sort hash keys in C...

2008-04-29 Thread Tim Bunce
On Tue, Apr 29, 2008 at 01:56:26AM +0100, Rudolf Lippan wrote:
> 
> On Mon, 28 Apr 2008 23:51:09 +0100, Tim Bunce <[EMAIL PROTECTED]> wrote:
> > On Mon, Apr 28, 2008 at 03:24:10PM +0100, Rudolf Lippan wrote:
> 
> >> Oh, And is there a way to attach a string to an SV w/o copying it?
> > 
> > Not sure what you mean by "attach a string to an SV". I'll guess you
> > mean append. In which case, yes, there are lots of ways:
> 
> Lets say I have a char *result. If I newSVpvn(*result, strlen(result)),
> *result is copied and I then have to free result... Is there a way to
> create a new SV without that copy?

Ah. No, none that would let you return the SV to the caller, I believe.


> >> +static int
> >> +_cmp_number (val1, val2)
> > 
> > Seeing this now I wonder if it might be better to have _sort_hash_keys
> > create an array of UV in parallel with an array of char* and then qsort
> > the array of UV if all the keys were valid numbers.
> 
> There would still need to be some way to tie the two arrays together after
> the sort, yes? Or just use the UV in the hash lookup? And what about
> floats?

The use case is for simple positive integer placeholder index values
so I think it would be reasonable to only do numeric sorting for them.

(We're not trying to be very general here, just general enough for the
few cases where it's worth writing in C to get the speed. Anyone needing
finer control can write their own in perl.)

> > That would avoid the frequent strtod() calls during sorting (and the
> > messing with errno) and avoid problems of accidentally doing a numeric
> > sort when not all keys are numeric.
> 
> I think the only time errno is set is for an under/overflow condition?... I
> could probably be ignored here.  

Perhaps, but the extra couple of lines don't cost much and are "the
right thing to do", just in case.

> >> +look_num = sv_2mortal(newSVpv(keys[0],0));
> >> +if (looks_like_number(look_num))
> >> +sort = _cmp_number;
> >> +else
> >> +sort = _cmp_str;
> > 
> > I'd set sort_order = (looks_like_number(look_num)) ? 1 : 0;
> > here and change the 'else if' below to an 'if'. Simpler logic.
> 
> Okay. My plan was to change the 'else if's into a array of function
> pointers that that the sort_order indexed into, so to add a new sort order
> all you would do add an entry to the array of function pointers.

Probably overkill (YAGNI).

> >> +SV *
> >> +_concat_hash_sorted (hash, kv_separator, pair_separator, 
> >> value_format,sort_type)
> 
> >> +hv_len = hv_iterinit(hash);
> >> +/* total_len += Separators + quotes + term null */
> >> +total_len += kv_sep_len*hv_len + pair_sep_len*hv_len+2*hv_len+1;
> >> +joined = malloc(total_len*sizeof(char));
> > 
> > I think it would be better to keep appending to an sv rather than try to
> > pre-guess the size. That would avoid the need for the malloc/realloc/free.
> 
> Well it would avoid my having to do it; Perl still does
> 
> > Probably slightly more expensive, but not by much if you pre-grow the sv.

SvGROW(sv, estimated_len)

> But it also eliminates the copy of joined to an SV, so it might be a wash?

Perhaps. The shorter cleaner code would be nicer for maitenance as well.

Thanks again Rudolf.

Tim.


Re: Sort hash keys in C...

2008-04-28 Thread Rudolf Lippan

On Mon, 28 Apr 2008 23:51:09 +0100, Tim Bunce <[EMAIL PROTECTED]> wrote:
> On Mon, Apr 28, 2008 at 03:24:10PM +0100, Rudolf Lippan wrote:
>>

> 
> Do you have svn? You could post patches from 'svn diff'.
> See http://search.cpan.org/~timb/DBI/DBI.pm#CONTRIBUTING

I am more of an SVK person, but svk is svn... kinda.  I had the DBI tarball
on my system and I would have had to go looking for the repository... I'll
go look and set up a mirror.

>> Oh, And is there a way to attach a string to an SV w/o copying it?
> 
> Not sure what you mean by "attach a string to an SV". I'll guess you
> mean append. In which case, yes, there are lots of ways:
> 

Lets say I have a char *result. If I newSVpvn(*result, strlen(result)),
*result is copied and I then have to free result... Is there a way to
create a new SV without that copy?


> 
>> --- DBI-1.604/DBI.xs 2008-03-24 09:44:38.0 -0400
>> +++ DBI-1.604-concat_hash/DBI.xs 2008-04-28 14:57:57.0 -0400
>> @@ -209,6 +209,95 @@
>>  return buf;
>>  }
>>
>> +static int
>> +_cmp_number (val1, val2)
>> +const void *val1;
>> +const void *val2;
>> +{
>> +dTHX;
>> +double first, second;
>> +char **endptr = 0;
>> +int old_err;
>> +
>> +old_err = errno; /* needed ? */
>> +errno = 0;
>> +first = strtod(*(char **)val1, endptr);
>> +if (0 != errno) {
>> +croak(strerror(errno));
>> +}
>> +errno = 0;
>> +second = strtod(*(char **)val2, endptr);
>> +if (0 != errno) {
>> +croak(strerror(errno));
>> +}
>> +errno = old_err;
>> +
>> +if (first == second)
>> +return 0;
>> +else if (first > second)
>> +return 1;
>> +else
>> +return -1;
>> +}
> 
> Seeing this now I wonder if it might be better to have _sort_hash_keys
> create an array of UV in parallel with an array of char* and then qsort
> the array of UV if all the keys were valid numbers.
> 

There would still need to be some way to tie the two arrays together after
the sort, yes? Or just use the UV in the hash lookup? And what about
floats?

> That would avoid the frequent strtod() calls during sorting (and the
> messing with errno) and avoid problems of accidentally doing a numeric
> sort when not all keys are numeric.
> 

I think the only time errno is set is for an under/overflow condition?... I
could probably be ignored here.  


> 
> 
> You can use New() to avoid the need to test and croak:
> 
> New(0, keys, hv_len, char *);

I forgot about the error checking; I was thinking on going back and
changing malloc's to New's to make it more xsish, but that is an even
better reason to make the change.

> 
>> +/* replace with function table */
>> +if (sort_order < 0) {
> 
> You're assuming the 'char' type is signed. It might not be.
> 
I see, I think. 
s/char/int/ 


>> +look_num = sv_2mortal(newSVpv(keys[0],0));
>> +if (looks_like_number(look_num))
>> +sort = _cmp_number;
>> +else
>> +sort = _cmp_str;
> 
> I'd set sort_order = (looks_like_number(look_num)) ? 1 : 0;
> here and change the 'else if' below to an 'if'. Simpler logic.
> 

Okay. My plan was to change the 'else if's into a array of function
pointers that that the sort_order indexed into, so to add a new sort order
all you would do add an entry to the array of function pointers.

>> +} else if (0 == sort_order) {
>> +sort = _cmp_str;
>> +} else if (1 == sort_order) {
>> +sort = _cmp_number;
>> +} else {
>> +croak("Unknown sort order %i", sort_order);
>> +}
>> +qsort(keys, hv_len, sizeof(char*), sort);
>> +return keys;
>> +}



>> +SV *
>> +_concat_hash_sorted (hash, kv_separator, pair_separator,
> value_format,sort_type)
>> +HV *hash
>> +SV *kv_separator
>> +SV *pair_separator
>> +SV *value_format
>> +SV *sort_type
>> +
>> +PREINIT:
>> +I32 hv_len;
>> +STRLEN kv_sep_len, pair_sep_len, hv_val_len, pos=0, total_len =
> 0;
>> +char **keys;
>> +char *joined, *kv_sep, *pair_sep, *hv_val;
>> +unsigned int i = 0;
>> +char sort;
>> +SV **hash_svp;
>> +SV *return_sv;
>> +bool not_neat;
>> +CODE:
>> +
>> +kv_sep = SvPV(kv_separator, kv_sep_len);
>> +pair_sep = SvPV(pair_separator, pair_sep_len);
>> +
>> +if (SvGMAGICAL(value_format))
>> +mg_get(value_format);
>> +not_neat = SvTRUE(value_format);
>> +
>> +sort = -1;
>> +if (SvOK(sort_type)) {
>> +sort = SvIV(sort_type);
>> +}
> 
> I'd use one line instead of four:
> 
> sort = (SvOK(sort_type)) ? SvIV(sort_type) : -1;

That works.  I can go either way on the ternary operator.

> 
> I also tend to add _sv suffix to SV variables in non-trivial subs that
> have quite a few sv and non-sv variables (same for _av, _hv etc). E.g.,:
> 
I don't see this, but if it make the code more DBIish, I go ahead and make
the change.

>   

Re: Sort hash keys in C...

2008-04-28 Thread Tim Bunce
On Mon, Apr 28, 2008 at 03:24:10PM +0100, Rudolf Lippan wrote:
> 
> 
> On Thu, 24 Apr 2008 13:31:47 +0100, Tim Bunce <[EMAIL PROTECTED]> wrote:
> 
> Hi Tim,
> 
> > So I'd be happy to see an API like this:
> > 
> > SV *_concat_hash_sorted( HV *hv, char *kv_sep, char *pair_sep, SV
> > *value_format, SV *sort_type)
> 
> Attached is the latest draft of _concat_hash_sorted(). I cleaned up the
> types, fixed a few bugs,  and formalized the test suite.

Thanks Rudolf. (I'm sorry I didn't get to look at the earlier version.)

Quite a few of my comments are more stylistic, so feel free to pay less
attention to any you disagree with :)

> Also, while I was testing, I found what looks to be a memory leak in
> neat_svpv:
> 
> if (SvIOK(sv))
> -nsv = newSVpvf("%"IVdf, SvIVX(sv));
> -   else nsv = newSVpvf("%"NVgf, SvNVX(sv));
> +nsv = sv_2mortal(newSVpvf("%"IVdf, SvIVX(sv)));
> +   else nsv = sv_2mortal(newSVpvf("%"NVgf, SvNVX(sv)));

Good catch. Thanks. I've updated the repository.

Do you have svn? You could post patches from 'svn diff'.
See http://search.cpan.org/~timb/DBI/DBI.pm#CONTRIBUTING

> If there is anything you would like changed, let me know. I was not sure
> about using strcat/strncat, so if you'd like, I will change those. 
> 
> Oh, And is there a way to attach a string to an SV w/o copying it?

Not sure what you mean by "attach a string to an SV". I'll guess you
mean append. In which case, yes, there are lots of ways:

sv_catpv
sv_catpvf
sv_catpvn
sv_catpvn_flags
sv_vcatpvf
sv_vcatpvfn

See perldoc perlapi.

> --- DBI-1.604/DBI.xs  2008-03-24 09:44:38.0 -0400
> +++ DBI-1.604-concat_hash/DBI.xs  2008-04-28 14:57:57.0 -0400
> @@ -209,6 +209,95 @@
>  return buf;
>  }
>  
> +static int
> +_cmp_number (val1, val2)
> +const void *val1;
> +const void *val2;
> +{
> +dTHX;
> +double first, second;
> +char **endptr = 0;
> +int old_err;
> +
> +old_err = errno; /* needed ? */
> +errno = 0;
> +first = strtod(*(char **)val1, endptr);
> +if (0 != errno) {
> +croak(strerror(errno));
> +}
> +errno = 0;
> +second = strtod(*(char **)val2, endptr);
> +if (0 != errno) {
> +croak(strerror(errno));
> +}
> +errno = old_err;
> +
> +if (first == second)
> +return 0;
> +else if (first > second)
> + return 1;
> +else 
> + return -1;
> +}

Seeing this now I wonder if it might be better to have _sort_hash_keys
create an array of UV in parallel with an array of char* and then qsort
the array of UV if all the keys were valid numbers.

That would avoid the frequent strtod() calls during sorting (and the
messing with errno) and avoid problems of accidentally doing a numeric
sort when not all keys are numeric.

Just a thought. Quite a bit more work though so feel free to ignore my
ramblings!

> +char **
> +_sort_hash_keys (hash, sort_order, total_length)
> +HV *hash;
> +char sort_order;
> +STRLEN *total_length;
> +{
> +dTHX;
> +I32 hv_len, key_len;
> +SV *look_num;
> +HE *entry;
> +char **keys;
> +void *sort;
> +unsigned int idx = 0;
> +STRLEN tot_len = 0;

> +keys = malloc(sizeof(char *)*hv_len);
> +if (!keys)
> +croak("Unable to allocate memory");

You can use New() to avoid the need to test and croak:

New(0, keys, hv_len, char *);

> +/* replace with function table */
> +if (sort_order < 0) {

You're assuming the 'char' type is signed. It might not be.

> +look_num = sv_2mortal(newSVpv(keys[0],0));
> +if (looks_like_number(look_num))
> +sort = _cmp_number;
> +else
> +sort = _cmp_str;

I'd set sort_order = (looks_like_number(look_num)) ? 1 : 0;
here and change the 'else if' below to an 'if'. Simpler logic.

> +} else if (0 == sort_order) {
> +sort = _cmp_str;
> +} else if (1 == sort_order) {
> +sort = _cmp_number;
> +} else {
> +croak("Unknown sort order %i", sort_order);
> +}
> +qsort(keys, hv_len, sizeof(char*), sort);
> +return keys;
> +}


> +SV *
> +_concat_hash_sorted (hash, kv_separator, pair_separator, 
> value_format,sort_type)
> +HV *hash
> +SV *kv_separator
> +SV *pair_separator
> +SV *value_format
> +SV *sort_type
> +
> +PREINIT:
> +I32 hv_len;
> +STRLEN kv_sep_len, pair_sep_len, hv_val_len, pos=0, total_len = 0;
> +char **keys;
> +char *joined, *kv_sep, *pair_sep, *hv_val;
> +unsigned int i = 0;
> +char sort;
> +SV **hash_svp;
> +SV *return_sv;
> +bool not_neat;
> +CODE:
> +
> +kv_sep = SvPV(kv_separator, kv_sep_len);
> +pair_sep = SvPV(pair_separator, pair_sep_len);
> +
> +if (SvGMAGICAL(value_format))
> +mg_get(value_format);
> +not_neat = SvTRUE(value_format);
> +
> +sort = -1;
> +if (SvOK(sort_ty

Re: Sort hash keys in C...

2008-04-28 Thread Rudolf Lippan


On Thu, 24 Apr 2008 13:31:47 +0100, Tim Bunce <[EMAIL PROTECTED]> wrote:

Hi Tim,

> So I'd be happy to see an API like this:
> 
> SV *_concat_hash_sorted( HV *hv, char *kv_sep, char *pair_sep, SV
> *value_format, SV *sort_type)
> 

Attached is the latest draft of _concat_hash_sorted(). I cleaned up the
types, fixed a few bugs,  and formalized the test suite.


Also, while I was testing, I found what looks to be a memory leak in
neat_svpv:

if (SvIOK(sv))
-nsv = newSVpvf("%"IVdf, SvIVX(sv));
-   else nsv = newSVpvf("%"NVgf, SvNVX(sv));
+nsv = sv_2mortal(newSVpvf("%"IVdf, SvIVX(sv)));
+   else nsv = sv_2mortal(newSVpvf("%"NVgf, SvNVX(sv)));


If there is anything you would like changed, let me know. I was not sure
about using strcat/strncat, so if you'd like, I will change those. 

Oh, And is there a way to attach a string to an SV w/o copying it?

-r
--- DBI-1.604/DBI.xs	2008-03-24 09:44:38.0 -0400
+++ DBI-1.604-concat_hash/DBI.xs	2008-04-28 14:57:57.0 -0400
@@ -209,6 +209,95 @@
 return buf;
 }
 
+static int
+_cmp_number (val1, val2)
+const void *val1;
+const void *val2;
+{
+dTHX;
+double first, second;
+char **endptr = 0;
+int old_err;
+
+old_err = errno; /* needed ? */
+errno = 0;
+first = strtod(*(char **)val1, endptr);
+if (0 != errno) {
+croak(strerror(errno));
+}
+errno = 0;
+second = strtod(*(char **)val2, endptr);
+if (0 != errno) {
+croak(strerror(errno));
+}
+errno = old_err;
+
+if (first == second)
+return 0;
+else if (first > second)
+	return 1;
+else 
+	return -1;
+}
+
+static int _cmp_str (val1, val2)
+const void *val1;
+const void *val2;
+{
+dTHX;
+return strcmp( *(char **)val1, *(char **)val2);
+}
+
+char **
+_sort_hash_keys (hash, sort_order, total_length)
+HV *hash;
+char sort_order;
+STRLEN *total_length;
+{
+dTHX;
+I32 hv_len, key_len;
+SV *look_num;
+HE *entry;
+char **keys;
+void *sort;
+unsigned int idx = 0;
+STRLEN tot_len = 0;
+
+hv_len = hv_iterinit(hash);
+if (!hv_len)
+return 0;
+
+keys = malloc(sizeof(char *)*hv_len);
+if (!keys)
+croak("Unable to allocate memory");
+
+while ((entry = hv_iternext(hash))) {
+*(keys+(idx++)) = hv_iterkey(entry, &key_len);
+tot_len += key_len;
+}
+if (0 != total_length)
+*total_length = tot_len;
+
+/* replace with function table */
+if (sort_order < 0) {
+look_num = sv_2mortal(newSVpv(keys[0],0));
+if (looks_like_number(look_num))
+sort = _cmp_number;
+else
+sort = _cmp_str;
+} else if (0 == sort_order) {
+sort = _cmp_str;
+} else if (1 == sort_order) {
+sort = _cmp_number;
+} else {
+croak("Unknown sort order %i", sort_order);
+}
+qsort(keys, hv_len, sizeof(char*), sort);
+return keys;
+}
+
+
+
 /* handy for embedding into condition expression for debugging */
 /*
 static int warn1(char *s) { warn(s); return 1; }
@@ -374,8 +463,8 @@
 	}
 	/* we don't use SvPV here since we don't want to alter sv in _any_ way	*/
 	if (SvIOK(sv))
-	 nsv = newSVpvf("%"IVdf, SvIVX(sv));
-	else nsv = newSVpvf("%"NVgf, SvNVX(sv));
+	 nsv = sv_2mortal(newSVpvf("%"IVdf, SvIVX(sv)));
+	else nsv = sv_2mortal(newSVpvf("%"NVgf, SvNVX(sv)));
 	if (infosv)
 	sv_catsv(nsv, infosv);
 	return SvPVX(nsv);
@@ -4236,6 +4325,102 @@
 RETVAL
 
 
+SV *
+_concat_hash_sorted (hash, kv_separator, pair_separator, value_format,sort_type)
+HV *hash
+SV *kv_separator
+SV *pair_separator
+SV *value_format
+SV *sort_type
+
+PREINIT:
+I32 hv_len;
+STRLEN kv_sep_len, pair_sep_len, hv_val_len, pos=0, total_len = 0;
+char **keys;
+char *joined, *kv_sep, *pair_sep, *hv_val;
+unsigned int i = 0;
+char sort;
+SV **hash_svp;
+SV *return_sv;
+bool not_neat;
+CODE:
+
+kv_sep = SvPV(kv_separator, kv_sep_len);
+pair_sep = SvPV(pair_separator, pair_sep_len);
+
+if (SvGMAGICAL(value_format))
+mg_get(value_format);
+not_neat = SvTRUE(value_format);
+
+sort = -1;
+if (SvOK(sort_type)) {
+sort = SvIV(sort_type);
+}
+
+
+keys = _sort_hash_keys(hash, sort, &total_len);
+if (!keys) {
+ST(0) = Nullsv;
+return;
+	}
+hv_len = hv_iterinit(hash);
+/* total_len += Separators + quotes + term null */
+total_len += kv_sep_len*hv_len + pair_sep_len*hv_len+2*hv_len+1;
+joined = malloc(total_len*sizeof(char));
+
+for (i=0; i 0) {
+strcpy(joined+pos, pair_sep);
+pos += pair_sep_len;
+}
+strcpy(joined+pos, keys[i]);
+pos += strlen(keys[i]);
+
+hash_svp = hv_fetch(hash, keys[i], strlen(

Re: Sort hash keys in C...

2008-04-24 Thread Rudolf Lippan


On Thu, 24 Apr 2008 13:31:47 +0100, Tim Bunce <[EMAIL PROTECTED]> wrote:

> So I'd be happy to see an API like this:
> 
> SV *_concat_hash_sorted( HV *hv, char *kv_sep, char *pair_sep, SV
> *value_format, SV *sort_type)
> 
> The kv_sep & pair_sep params could be SV*'s if that's easier to
implement.
> 

Sounds good. I'll work on that this evening.


-r



Re: Sort hash keys in C...

2008-04-24 Thread Tim Bunce
On Thu, Apr 24, 2008 at 12:50:09AM +0100, Rudolf Lippan wrote:
> 
> 
> On Wed, 23 Apr 2008 09:34:13 +0100, Tim Bunce <[EMAIL PROTECTED]> wrote:
> 
> > Here's a more refined and detailed outline. It's evolved from one I
>  
> aka more involved ;)

Naturally ;)

> > sub _concat_hash_sorted {
> > my ( $hash_ref, $kv_separator, $pair_separator, $value_format, 
> > $sort_type ) = @_;


> Looks fine. I did draft of the above (without the neatsvpv for right now as
> I need to dig into that some more), and I have couple of questions:
> 
> 1. You no longer want _concat_hash_keys() ?

The name has changed, the details haven't much. The idea is the same.
It was a poor name as it needs to concat both keys and values.

> 2. How closely are you expecting the C API to match the perl api?  It looks
> like the C stuff would have to do less work if it were to pass some of the
> string lengths around.

I had originally thought I'd want a C API for _get_sorted_hash_keys()
for the ShowErrorStatement code to use. Now, with the revised
description, that's not needed. The new _concat_hash_sorted()
functionality means it can be used directly by the ShowErrorStatement
code as well as prepare_cached and connect_cached.

So I'd be happy to see an API like this:

SV *_concat_hash_sorted( HV *hv, char *kv_sep, char *pair_sep, SV 
*value_format, SV *sort_type)

The kv_sep & pair_sep params could be SV*'s if that's easier to implement.

> 3. Do we worry about Magic here?

Not beyond the basic best practice of using this at appropriate places:

if (SvGMAGICAL(sv))
mg_get(sv);

> > p.s. Some tests with reasonable coverage would be the icing on the cake!
> 
> Of course!

Wonderful. Thanks!

Tim.


Re: Sort hash keys in C...

2008-04-23 Thread Rudolf Lippan


On Wed, 23 Apr 2008 09:34:13 +0100, Tim Bunce <[EMAIL PROTECTED]> wrote:

> Here's a more refined and detailed outline. It's evolved from one I
 
aka more involved ;)

> sub _concat_hash_sorted {
> my ( $hash_ref, $kv_separator, $pair_separator, $value_format,
> $sort_type ) = @_;
> # $value_format: false=use neat(), true=dumb quotes
> # $sort_type: 0=lexical, 1=numeric, undef=try to guess
> 
> $keys = _get_sorted_hash_keys($hash_ref, $sort_type);
> my $string = '';
> for my $key (@$keys) {
> $string .= $pair_separator if length $string > 0;
> my $value = $hash_ref->{$key};
> if ($value_format) {
> $value = (defined $value) ? "'$value'" : 'undef';
> }
> else {
> $value = neat($value,0);
> }
> $string .= $key . $kv_separator . $value;
> }
> return $string;
> }
> 
> sub _get_sorted_hash_keys {
> my ($hash_ref, $sort_type) = @_;
> if (not defined $sort_type) {
> my $first_key = (each %$hash_ref)[0];
> $sort_type = looks_like_number($first_key);
> }
> my @keys = keys %$hash_ref;
> @keys = ($sort_type)
> ? sort sort_numerically @keys
> : sort sort_lexicaly@keys;
> return [EMAIL PROTECTED];
> }
> 
> Does that look okay?

Looks fine. I did draft of the above (without the neatsvpv for right now as
I need to dig into that some more), and I have couple of questions:

1. You no longer want _concat_hash_keys() ?
2. How closely are you expecting the C API to match the perl api?  It looks
like the C stuff would have to do less work if it were to pass some of the
string lengths around.
3. Do we worry about Magic here?

> 
> p.s. Some tests with reasonable coverage would be the icing on the cake!

Of course!


-r



Re: Sort hash keys in C...

2008-04-23 Thread Tim Bunce
On Wed, Apr 23, 2008 at 12:17:42AM +0100, Rudolf Lippan wrote:
> On Tue, 22 Apr 2008 11:24:17 +0100, Tim Bunce <[EMAIL PROTECTED]> wrote:
> > Any volunteers, either for both or just the sort function?
> 
> Hi Tim,
> 
> I would be happy to take a crack at them tomorrow -- actually I have a
> rough draft implementation that I will work on cleaning up and testing. 

Great!

Here's a more refined and detailed outline. It's evolved from one I
wrote up for "Sir Woody Hackswell" <[EMAIL PROTECTED]> (who also
expressed off-list an interest in implementing it, though I'd guess
hasn't done much yet).


API:

_concat_hash_sorted( $hash_ref, $kv_separator, $pair_separator, 
$value_format, $sort_type )

Then the ParamValues for ShowErrorStatement can be formatted using:

_concat_hash_sorted( $sth->{ParamValues}, "=", ", ", 0, undef )

Here's a rough sketch of the logic broken up into steps that would
roughly match a C implementation:

sub _concat_hash_sorted {
my ( $hash_ref, $kv_separator, $pair_separator, $value_format, $sort_type ) 
= @_;
# $value_format: false=use neat(), true=dumb quotes
# $sort_type: 0=lexical, 1=numeric, undef=try to guess

$keys = _get_sorted_hash_keys($hash_ref, $sort_type);
my $string = '';
for my $key (@$keys) {
$string .= $pair_separator if length $string > 0;
my $value = $hash_ref->{$key};
if ($value_format) {
$value = (defined $value) ? "'$value'" : 'undef';
}
else {
$value = neat($value,0);
}
$string .= $key . $kv_separator . $value;
}
return $string;
}

sub _get_sorted_hash_keys {
my ($hash_ref, $sort_type) = @_;
if (not defined $sort_type) {
my $first_key = (each %$hash_ref)[0];
$sort_type = looks_like_number($first_key);
}
my @keys = keys %$hash_ref;
@keys = ($sort_type)
? sort sort_numerically @keys
: sort sort_lexicaly@keys;
return [EMAIL PROTECTED];
}

Does that look okay?

(Note that this isn't trying to be general purpose. Like neat() it's a
97.8365% solution that fits the most common needs.)

Thanks!

Tim.

p.s. Some tests with reasonable coverage would be the icing on the cake!


Re: Sort hash keys in C...

2008-04-22 Thread Rudolf Lippan

On Tue, 22 Apr 2008 11:24:17 +0100, Tim Bunce <[EMAIL PROTECTED]> wrote:
> The DBI needs to sort the keys in %$attr for connect_cached and
> prepare_cached to ensure a canonical cache key is generated.
> 
> Currently this is done in perl and is relatively expensive:
> 
> my @attr_keys = ($attr) ? sort keys %$attr : ();
> my $key = do { local $^W; # silence undef warnings
> join "~~", $dsn, $user, $auth, $attr ? 
> (@attr_keys,@[EMAIL PROTECTED]) : ()
> };
> 
> my @attr_keys = ($attr) ? sort keys %$attr : ();
> my $key = ($attr) ? join("~~", $statement, @attr_keys,
> @[EMAIL PROTECTED]) : $statement;
> 
> I'd like to wrap that up into a _concat_hash_keys( \%hash, $separator,
> [EMAIL PROTECTED] )
> function implemented in C..
> 
> As part of that we'd need a function to return the keys of a hash in
> sorted order. That just needs to allocate an array of char*'s, load it
> with pointers to each of the keys in the hash, and then call qsort().
> 
> The bonus is that that function would enable the ParamValues reported by
> ShowErrorStatement to be listed in sorted order:
> http://rt.cpan.org/Ticket/Display.html?id=27272
> (function could take a flag to indicate lexical or numeric sort order)
> 
> Any volunteers, either for both or just the sort function?
> 

Hi Tim,

I would be happy to take a crack at them tomorrow -- actually I have a
rough draft implementation that I will work on cleaning up and testing. 

-r