Re: [Numpy-discussion] String sort

2008-02-15 Thread Francesc Altet
Hi Chuck,

I've given more testing to the new quicksort routines for strings in the 
forthcoming NumPy.  I've run the indexing test units in PyTables Pro 
(they stress the sorting routines a lot) against the current version of 
NumPy in the repository, for the complete set of quicksort, mergesort 
and heapsort of the new implementation, and I'm happy to say to 
everything went very smoothly, i.e. more than 1000 tests with different 
input arrays has passed flawlessly.  Good job!

Cheers,

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-15 Thread Francesc Altet
A Thursday 14 February 2008, Charles R Harris escrigué:
 On Thu, Feb 14, 2008 at 1:03 PM, Francesc Altet [EMAIL PROTECTED] 
wrote:
  Maybe I'd also be interested in trying insertion sort out.  During
  the optimization process of an OPSI index, there is a need to sort
  out a slice of data that is already made of smaller chunks that are
  already sorted, so chances are that insertion sort could be
  significantly faster than the merge sort (or even the quick sort)
  in this scenario.
 
  But this is becoming an OT.  However, I'd be glad to further dicuss
  this privately, if you like to.

 Well, I don't have much more to say. If you do decide that insertion
 sort will be useful you won't have to twist my arm much to get it,
 but I think it is most useful when the data never has to move far. In
 the case of quicksort and mergesort it is called to deal with small
 unsorted chunks, but the chunks themselves are already in the right
 place. Some kind of multi merge sort might be more appropriate to the
 OPSI index.

OK, thanks, I'll have you in mind ;)  And yes, multi-way merge seems 
very interesting for OPSI indeed.  Eventually, it might even be a good 
candidate for taking advantage of the several cores in modern CPU's;  
so it would be really interesting to check out this path.

Thanks for very insightful hints!

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-15 Thread Charles R Harris
On Fri, Feb 15, 2008 at 5:09 AM, Francesc Altet [EMAIL PROTECTED] wrote:

 Hi Chuck,

 I've given more testing to the new quicksort routines for strings in the
 forthcoming NumPy.  I've run the indexing test units in PyTables Pro
 (they stress the sorting routines a lot) against the current version of
 NumPy in the repository, for the complete set of quicksort, mergesort
 and heapsort of the new implementation, and I'm happy to say to
 everything went very smoothly, i.e. more than 1000 tests with different
 input arrays has passed flawlessly.  Good job!


Hi Francesc,

Thanks for the thorough testing. It makes me feel much more comfortable this
close to the release of 1.0.5.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-15 Thread Francesc Altet
A Friday 15 February 2008, Charles R Harris escrigué:
 On Fri, Feb 15, 2008 at 5:09 AM, Francesc Altet [EMAIL PROTECTED] 
wrote:
  Hi Chuck,
 
  I've given more testing to the new quicksort routines for strings
  in the forthcoming NumPy.  I've run the indexing test units in
  PyTables Pro (they stress the sorting routines a lot) against the
  current version of NumPy in the repository, for the complete set of
  quicksort, mergesort and heapsort of the new implementation, and
  I'm happy to say to everything went very smoothly, i.e. more than
  1000 tests with different input arrays has passed flawlessly.  Good
  job!

 Hi Francesc,

 Thanks for the thorough testing. It makes me feel much more
 comfortable this close to the release of 1.0.5.

You are welcome.  I forgot to mention that I tested out both direct 
(.sort()) and indirect (.argsort()) methods.

Cheers,

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-14 Thread Francesc Altet
A Wednesday 13 February 2008, Scott Ransom escrigué:
 On Wednesday 13 February 2008 02:37:37 pm Francesc Altet wrote:
  So, I'd say that the guilty is the gcc 4.2.1, 64-bit (or at very
  least, AMD Opteron architecture) and that newqsort performs really
  well in general (provided that the compiler can find the best path
  for optimizing its code).  Anyone using a 64-bit platform and
  having both gcc 4.1.2 and 4.2.1 installed can confirm this?

 Here are results from a 64-bit Debian system using a Core2 Duo 2.66
 GHz processor.

 I used gcc 3.4.6, 4.1.3, 4.2.3, and 4.3.0 (20080202 experimental)
 with -O2 and -O3.

 Summary:  There is a big difference between -02 and -O3.  gcc-4.2
 seems slightly better than the other gccs.  And the newqsort is a lot
 faster (always) than the libc version.

 Scott

 eiger:/data1$ ./sort346_O2
 Benchmark with 100 strings of size 15
 C qsort with C style compare: 0.55
 C qsort with Python style compare: 0.53
 NumPy newqsort: 0.45

 eiger:/data1$ ./sort346_O3
 Benchmark with 100 strings of size 15
 C qsort with C style compare: 0.55
 C qsort with Python style compare: 0.52
 NumPy newqsort: 0.35

 eiger:/data1$ ./sort413_O2
 Benchmark with 100 strings of size 15
 C qsort with C style compare: 0.56
 C qsort with Python style compare: 0.53
 NumPy newqsort: 0.42

 eiger:/data1$ ./sort413_O3
 Benchmark with 100 strings of size 15
 C qsort with C style compare: 0.54
 C qsort with Python style compare: 0.50
 NumPy newqsort: 0.28

 eiger:/data1$ ./sort423_O2
 Benchmark with 100 strings of size 15
 C qsort with C style compare: 0.56
 C qsort with Python style compare: 0.53
 NumPy newqsort: 0.39

 eiger:/data1$ ./sort423_O3
 Benchmark with 100 strings of size 15
 C qsort with C style compare: 0.53
 C qsort with Python style compare: 0.50
 NumPy newqsort: 0.27

 eiger:/data1$ ./sort43_O2
 Benchmark with 100 strings of size 15
 C qsort with C style compare: 0.55
 C qsort with Python style compare: 0.53
 NumPy newqsort: 0.34

 eiger:/data1$ ./sort43_O3
 Benchmark with 100 strings of size 15
 C qsort with C style compare: 0.53
 C qsort with Python style compare: 0.51
 NumPy newqsort: 0.33

Thanks Scott.  Your input is very valuable, as it seems to confirm that 
the problem must be on gcc 4.2.1 on 64-bit (or Opteron architecture at 
very least) because apparently your gcc 4.2.3 is doing very well.  It's 
a pity that I don't have a 4.2.3 available in our SuSe/Opteron machine 
so as to check if the optimization flaw disappears.  But it seems to me 
that the problem could be specific of 4.2.1, and apparently the GCC 
crew has fixed the problem in 4.2.3, which is a relief.

In any case, if anybody have access to an Opteron machine and gcc 4.2.3, 
it would be great if he can run the benchmark and contribute his 
feedback.

Cheers,

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-14 Thread Francesc Altet
A Thursday 14 February 2008, escriguéreu:
  In any case, if anybody have access to an Opteron machine and gcc
  4.2.3, it would be great if he can run the benchmark and contribute
  his feedback.

 Here it is with gcc 4.2.3 on an Opteron 246 (2.0 GHz):

 uller:~$ ./sort423_O2# with -O2
 Benchmark with 100 strings of size 15
 C qsort with C style compare: 0.77
 C qsort with Python style compare: 0.74
 NumPy newqsort: 0.63

 uller:~$ ./sort423_O3# with -O3
 Benchmark with 100 strings of size 15
 C qsort with C style compare: 0.64
 C qsort with Python style compare: 0.66
 NumPy newqsort: 0.40

And here are my timings with gcc 4.1.3 and using a similar Opteron than 
yours (270 @ 2.0 GHz):

With -O2:
Benchmark with 100 strings of size 15
C qsort with C style compare: 0.75
C qsort with Python style compare: 0.70
NumPy newqsort: 0.69

With -O3:
Benchmark with 100 strings of size 15
C qsort with C style compare: 0.67
C qsort with Python style compare: 0.62
NumPy newqsort: 0.38

So, it seems clear that the GCC people has fixed in 4.2.3 the problem 
with the optimizer introduced in 4.2.1.  Very good!  

By the way, it's nice to see the wide range of platforms that this list 
allows to test out :-)

Cheers,

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-14 Thread Charles R Harris
On Thu, Feb 14, 2008 at 9:11 AM, Francesc Altet [EMAIL PROTECTED] wrote:

 A Wednesday 13 February 2008, Charles R Harris escrigué:
  On Feb 13, 2008 10:56 AM, Francesc Altet [EMAIL PROTECTED] wrote:
   Be warned, I'd like to stress out that these are my figures for my
   _own laptop_.  It would be nice if you can verify all of this with
   other achitectures (your Core2 machine seems different enough).  I
   can run the benchmarks on Windows (installed in the same laptop)
   too.  Tell me if you are interested on me doing this.
 
  Its easy enough to test if you compile from svn, just add your new
  copy function and change the name in this line:
 
 #copy=copy_string, copy_ucs4#
 
  to use your function instead of copy_string.

 I've spoken too fast.  I've never compiled NumPy on Windows before, and
 had problems when trying to compile it using MSVC 7.1 and a recent copy
 of the repository.  Well, in any case, I've exercised the Opteron
 platform, using gcc 4.1.3 (i.e. the one that can optimize newqsort
 correctly), and this brings new light to our study.

 From the plot (attached), it can be drawn the next conclusions:

 1) copy_string2 (the combination of manual copy and memcpy) is not
 better than memcpy for *any* value of the string length in our Opteron
 platform.  Also, the improvements with Pentium4 was not that big
 (20%).  In consequence, I'd choose to always use memcpy and discard
 copy_string2.


Your copy_string2 is now the version in numpy. I'm hesitant to make memcpy
the default for all string lengths because I've read that memcpy was much
improved in later gcc (= 4.1 ?), but known slow in older versions. So
perhaps in a year or two when the newer compilers are more widespread would
be a better time to make the change. The switch at the 16 char length
shouldn't make that much difference in practice. I'll put a comment in the
source so that the thought won't get lost. BTW, using copy_string2 much
improved the performance of the string mergesort where a lot of data needs
to be copied to the work array. It's now half as fast as quicksort instead
of 1/3 ;) Heap sort continues in it's traditional slot as the slowest of
all. Slow but safe.


 2) Curiously enough, the indirect sort in Opterons is *always* faster
 than newqsort+memcpy.  For large values of string lengths ( 256), the
 speed-up can be up to 3x, which is a lot.  And I'd say that this keeps
 true also for most modern processors (read Core2, Barcelona).  For older
 processors (Pentium4), the indirect method can be slower than direct
 plot for small lengths, but by a very few extent (10%).


The new indirect quicksort for strings is faster than the old qsort based
default, so perhaps that is also making a difference.



 Conclusion 2 makes me wonder if it wouldn't be useful the introduction
 of a new flag in sort, like:

 
 `indirect` - Use the indirect method for sorting.  This requires more
 memory (4/8 bytes per array element), but for sorting arrays of strings
 it is almost always faster than the direct approach (default). Beware:
 this is not the case when using numerical values, where the use of this
 method for sorting is not recommendable.
 


I'm more inclined to leave this to the user. I have a todo to add a function
to numpy that makes it easier to use the argsort output to sort
multidimensional arrays, I'll name it argtake or some such and it will use
the argsort output along with an axis argument. It won't be quite as memory
efficient for multidimensional arrays, but it should work about the same in
the 1D case.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-14 Thread Francesc Altet
A Thursday 14 February 2008, Charles R Harris escrigué:
 On Thu, Feb 14, 2008 at 9:11 AM, Francesc Altet [EMAIL PROTECTED] 
wrote:
  From the plot (attached), it can be drawn the next conclusions:
 
  1) copy_string2 (the combination of manual copy and memcpy) is not
  better than memcpy for *any* value of the string length in our
  Opteron platform.  Also, the improvements with Pentium4 was not
  that big (20%).  In consequence, I'd choose to always use memcpy
  and discard copy_string2.

 Your copy_string2 is now the version in numpy. I'm hesitant to make
 memcpy the default for all string lengths because I've read that
 memcpy was much improved in later gcc (= 4.1 ?), but known slow in
 older versions. So perhaps in a year or two when the newer compilers
 are more widespread would be a better time to make the change. The
 switch at the 16 char length shouldn't make that much difference in
 practice. I'll put a comment in the source so that the thought won't
 get lost.

Well, copy_string2 is only marginally slower than memcpy on modern gcc 
compilers and processors, so I presume that this is fine.

 BTW, using copy_string2 much improved the performance of
 the string mergesort where a lot of data needs to be copied to the
 work array. It's now half as fast as quicksort instead of 1/3 ;) Heap
 sort continues in it's traditional slot as the slowest of all. Slow
 but safe.

I've seen that you have added specific code for merge and heap sorting 
algorithms for strings.  Looks good.

  2) Curiously enough, the indirect sort in Opterons is *always*
  faster than newqsort+memcpy.  For large values of string lengths (
  256), the speed-up can be up to 3x, which is a lot.  And I'd say
  that this keeps true also for most modern processors (read Core2,
  Barcelona).  For older processors (Pentium4), the indirect method
  can be slower than direct plot for small lengths, but by a very few
  extent (10%).

 The new indirect quicksort for strings is faster than the old qsort
 based default, so perhaps that is also making a difference.

Yes, indeed it does!

  Conclusion 2 makes me wonder if it wouldn't be useful the
  introduction of a new flag in sort, like:
 
  
  `indirect` - Use the indirect method for sorting.  This requires
  more memory (4/8 bytes per array element), but for sorting arrays
  of strings it is almost always faster than the direct approach
  (default). Beware: this is not the case when using numerical
  values, where the use of this method for sorting is not
  recommendable.
  

 I'm more inclined to leave this to the user. I have a todo to add a
 function to numpy that makes it easier to use the argsort output to
 sort multidimensional arrays, I'll name it argtake or some such and
 it will use the argsort output along with an axis argument. It won't
 be quite as memory efficient for multidimensional arrays, but it
 should work about the same in the 1D case.

OK.  I don't completely grasp what are you trying to do here, but seems 
like a conservative enough path (in the sense that it won't touch the 
current parameters of existing sorting methods).

Looking forward to see the new qsort for strings in NumPy (the specific 
version for merge sort is very welcome too!).

Cheers,

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-14 Thread Charles R Harris
On Thu, Feb 14, 2008 at 10:46 AM, Francesc Altet [EMAIL PROTECTED] wrote:

 A Thursday 14 February 2008, Charles R Harris escrigué:
  On Thu, Feb 14, 2008 at 9:11 AM, Francesc Altet [EMAIL PROTECTED]
 wrote:
   From the plot (attached), it can be drawn the next conclusions:
  
   1) copy_string2 (the combination of manual copy and memcpy) is not
   better than memcpy for *any* value of the string length in our
   Opteron platform.  Also, the improvements with Pentium4 was not
   that big (20%).  In consequence, I'd choose to always use memcpy
   and discard copy_string2.
 
  Your copy_string2 is now the version in numpy. I'm hesitant to make
  memcpy the default for all string lengths because I've read that
  memcpy was much improved in later gcc (= 4.1 ?), but known slow in
  older versions. So perhaps in a year or two when the newer compilers
  are more widespread would be a better time to make the change. The
  switch at the 16 char length shouldn't make that much difference in
  practice. I'll put a comment in the source so that the thought won't
  get lost.

 Well, copy_string2 is only marginally slower than memcpy on modern gcc
 compilers and processors, so I presume that this is fine.

  BTW, using copy_string2 much improved the performance of
  the string mergesort where a lot of data needs to be copied to the
  work array. It's now half as fast as quicksort instead of 1/3 ;) Heap
  sort continues in it's traditional slot as the slowest of all. Slow
  but safe.

 I've seen that you have added specific code for merge and heap sorting
 algorithms for strings.  Looks good.

   2) Curiously enough, the indirect sort in Opterons is *always*
   faster than newqsort+memcpy.  For large values of string lengths (
   256), the speed-up can be up to 3x, which is a lot.  And I'd say
   that this keeps true also for most modern processors (read Core2,
   Barcelona).  For older processors (Pentium4), the indirect method
   can be slower than direct plot for small lengths, but by a very few
   extent (10%).
 
  The new indirect quicksort for strings is faster than the old qsort
  based default, so perhaps that is also making a difference.

 Yes, indeed it does!

   Conclusion 2 makes me wonder if it wouldn't be useful the
   introduction of a new flag in sort, like:
  
   
   `indirect` - Use the indirect method for sorting.  This requires
   more memory (4/8 bytes per array element), but for sorting arrays
   of strings it is almost always faster than the direct approach
   (default). Beware: this is not the case when using numerical
   values, where the use of this method for sorting is not
   recommendable.
   
 
  I'm more inclined to leave this to the user. I have a todo to add a
  function to numpy that makes it easier to use the argsort output to
  sort multidimensional arrays, I'll name it argtake or some such and
  it will use the argsort output along with an axis argument. It won't
  be quite as memory efficient for multidimensional arrays, but it
  should work about the same in the 1D case.

 OK.  I don't completely grasp what are you trying to do here, but seems
 like a conservative enough path (in the sense that it won't touch the
 current parameters of existing sorting methods).

 Looking forward to see the new qsort for strings in NumPy (the specific
 version for merge sort is very welcome too!).


I could never figure out what the merge sort was good for. I did the
indirect version in numarray because I needed a stable sort to implement
lexsort, which was my original aim. I just added the direct version for
completeness. If you have a use for it, I would love to hear it.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-14 Thread Francesc Altet
A Thursday 14 February 2008, Bruce Southey escrigué:
 Hi,
 I confirmed the gcc 4.2.3 performance for the Opteron:

 Benchmark with 100 strings of size 15
 C qsort with C style compare: 0.63
 C qsort with Python style compare: 0.63
 NumPy newqsort: 0.36

 I also installed the Intel icc 10.1 compiler on my Opteron system but
 I did not use any flags:
 $ /opt/intel/cc/10.1.008/bin/icc sort-string-bench.c -o icc_sort
 $ ./icc_sort
 Benchmark with 100 strings of size 15
 C qsort with C style compare: 1.03
 C qsort with Python style compare: 0.96
 NumPy newqsort: 0.53

That's excellent Bruce.  Definitely it looks like the problem with the 
optimizer in 4.2.1 has been fixed in 4.2.3.

And why you haven't used optimization flags with ICC? just curious...

Cheers,

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-14 Thread Francesc Altet
A Thursday 14 February 2008, Charles R Harris escrigué:
 On Thu, Feb 14, 2008 at 10:46 AM, Francesc Altet [EMAIL PROTECTED] 
wrote:
  Looking forward to see the new qsort for strings in NumPy (the
  specific version for merge sort is very welcome too!).

 I could never figure out what the merge sort was good for. I did the
 indirect version in numarray because I needed a stable sort to
 implement lexsort, which was my original aim. I just added the direct
 version for completeness. If you have a use for it, I would love to
 hear it.

Well, I must to confess that I've not used merge sorts yet, but I'd like 
to test them in the context of my PSI (Partially Sorted Indexes, see 
[1] for a white paper on a concrete implementation) work.  My hope is 
that, as a merge sort keeps the order of indices of elements that are 
equal (this is what 'stable' means), this would allow better 
compression rates for indices (and hence, less I/O effort to bring the 
indices from disk into memory and ultimately allowing for faster lookup 
speed).  This will probably be only important when one have data 
distributions with rather low cardinality, but these scenarios can be 
more frequent/important than one may think.

[1] http://www.carabos.com/docs/OPSI-indexes.pdf

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-14 Thread Charles R Harris
On Thu, Feb 14, 2008 at 11:44 AM, Francesc Altet [EMAIL PROTECTED] wrote:

 A Thursday 14 February 2008, Charles R Harris escrigué:
  On Thu, Feb 14, 2008 at 10:46 AM, Francesc Altet [EMAIL PROTECTED]
 wrote:
   Looking forward to see the new qsort for strings in NumPy (the
   specific version for merge sort is very welcome too!).
 
  I could never figure out what the merge sort was good for. I did the
  indirect version in numarray because I needed a stable sort to
  implement lexsort, which was my original aim. I just added the direct
  version for completeness. If you have a use for it, I would love to
  hear it.

 Well, I must to confess that I've not used merge sorts yet, but I'd like
 to test them in the context of my PSI (Partially Sorted Indexes, see
 [1] for a white paper on a concrete implementation) work.  My hope is
 that, as a merge sort keeps the order of indices of elements that are
 equal (this is what 'stable' means), this would allow better
 compression rates for indices (and hence, less I/O effort to bring the
 indices from disk into memory and ultimately allowing for faster lookup
 speed).  This will probably be only important when one have data
 distributions with rather low cardinality, but these scenarios can be
 more frequent/important than one may think.


Well, I take that back a bit. I think mergesort might work best for large
memory mapped arrays because it does sequential accesses, which might be
more disk efficient than random accesses. Then again, a divide and conquer
approach like quicksort eventually becomes localized too. I've never
experimented with really large sorts, they might perform differently than
the sorts that fit in memory.

Insertion sort is supposed to work well for almost sorted sequences, but
that application has always seemed a bit specialized to me. Although I'll
admit to being occasionally tempted to pull the insertion sorts out of
quicksort and mergesort and make them their own (type specific) routines.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-14 Thread Francesc Altet
A Thursday 14 February 2008, Charles R Harris escrigué:
 On Thu, Feb 14, 2008 at 11:44 AM, Francesc Altet [EMAIL PROTECTED] 
wrote:
  A Thursday 14 February 2008, Charles R Harris escrigué:
   On Thu, Feb 14, 2008 at 10:46 AM, Francesc Altet
   [EMAIL PROTECTED]
 
  wrote:
Looking forward to see the new qsort for strings in NumPy (the
specific version for merge sort is very welcome too!).
  
   I could never figure out what the merge sort was good for. I did
   the indirect version in numarray because I needed a stable sort
   to implement lexsort, which was my original aim. I just added the
   direct version for completeness. If you have a use for it, I
   would love to hear it.
 
  Well, I must to confess that I've not used merge sorts yet, but I'd
  like to test them in the context of my PSI (Partially Sorted
  Indexes, see [1] for a white paper on a concrete implementation)
  work.  My hope is that, as a merge sort keeps the order of indices
  of elements that are equal (this is what 'stable' means), this
  would allow better compression rates for indices (and hence, less
  I/O effort to bring the indices from disk into memory and
  ultimately allowing for faster lookup speed).  This will probably
  be only important when one have data distributions with rather low
  cardinality, but these scenarios can be more frequent/important
  than one may think.

 Well, I take that back a bit. I think mergesort might work best for
 large memory mapped arrays because it does sequential accesses, which
 might be more disk efficient than random accesses. Then again, a
 divide and conquer approach like quicksort eventually becomes
 localized too. I've never experimented with really large sorts, they
 might perform differently than the sorts that fit in memory.

Yeah, but I don't really want to use merge sort for out-of-core sorting, 
but just because it is stable.  The main point of a PSI indexing schema 
is that you don't need to completely sort your dataset (hence the 
name: Partially Sorted) in order to get an usable index, and this 
normally leads to much faster index creation times.

 Insertion sort is supposed to work well for almost sorted sequences,
 but that application has always seemed a bit specialized to me.
 Although I'll admit to being occasionally tempted to pull the
 insertion sorts out of quicksort and mergesort and make them their
 own (type specific) routines.

Maybe I'd also be interested in trying insertion sort out.  During the 
optimization process of an OPSI index, there is a need to sort out a 
slice of data that is already made of smaller chunks that are already 
sorted, so chances are that insertion sort could be significantly 
faster than the merge sort (or even the quick sort) in this scenario.

But this is becoming an OT.  However, I'd be glad to further dicuss this 
privately, if you like to.

Cheers,

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-14 Thread Charles R Harris
On Thu, Feb 14, 2008 at 1:03 PM, Francesc Altet [EMAIL PROTECTED] wrote:

 A Thursday 14 February 2008, Charles R Harris escrigué:
  On Thu, Feb 14, 2008 at 11:44 AM, Francesc Altet [EMAIL PROTECTED]
 wrote:
   A Thursday 14 February 2008, Charles R Harris escrigué:
On Thu, Feb 14, 2008 at 10:46 AM, Francesc Altet
[EMAIL PROTECTED]
  
   wrote:
 Looking forward to see the new qsort for strings in NumPy (the
 specific version for merge sort is very welcome too!).
   
I could never figure out what the merge sort was good for. I did
the indirect version in numarray because I needed a stable sort
to implement lexsort, which was my original aim. I just added the
direct version for completeness. If you have a use for it, I
would love to hear it.
  
   Well, I must to confess that I've not used merge sorts yet, but I'd
   like to test them in the context of my PSI (Partially Sorted
   Indexes, see [1] for a white paper on a concrete implementation)
   work.  My hope is that, as a merge sort keeps the order of indices
   of elements that are equal (this is what 'stable' means), this
   would allow better compression rates for indices (and hence, less
   I/O effort to bring the indices from disk into memory and
   ultimately allowing for faster lookup speed).  This will probably
   be only important when one have data distributions with rather low
   cardinality, but these scenarios can be more frequent/important
   than one may think.
 
  Well, I take that back a bit. I think mergesort might work best for
  large memory mapped arrays because it does sequential accesses, which
  might be more disk efficient than random accesses. Then again, a
  divide and conquer approach like quicksort eventually becomes
  localized too. I've never experimented with really large sorts, they
  might perform differently than the sorts that fit in memory.

 Yeah, but I don't really want to use merge sort for out-of-core sorting,
 but just because it is stable.  The main point of a PSI indexing schema
 is that you don't need to completely sort your dataset (hence the
 name: Partially Sorted) in order to get an usable index, and this
 normally leads to much faster index creation times.

  Insertion sort is supposed to work well for almost sorted sequences,
  but that application has always seemed a bit specialized to me.
  Although I'll admit to being occasionally tempted to pull the
  insertion sorts out of quicksort and mergesort and make them their
  own (type specific) routines.

 Maybe I'd also be interested in trying insertion sort out.  During the
 optimization process of an OPSI index, there is a need to sort out a
 slice of data that is already made of smaller chunks that are already
 sorted, so chances are that insertion sort could be significantly
 faster than the merge sort (or even the quick sort) in this scenario.

 But this is becoming an OT.  However, I'd be glad to further dicuss this
 privately, if you like to.


Well, I don't have much more to say. If you do decide that insertion sort
will be useful you won't have to twist my arm much to get it, but I think it
is most useful when the data never has to move far. In the case of quicksort
and mergesort it is called to deal with small unsorted chunks, but the
chunks themselves are already in the right place. Some kind of multi merge
sort might be more appropriate to the OPSI index.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-13 Thread Francesc Altet
A Tuesday 12 February 2008, Charles R Harris escrigué:
 On Feb 12, 2008 9:07 AM, Francesc Altet [EMAIL PROTECTED] wrote:
  * The newqsort performs the best on all the platforms we have
  checked (ranging from a 5% of improvement on Opteron/SuSe, up to
  3.8x with some Pentium4/Ubuntu systems).

 The 3.8 is amazing, isn't it? I've found that the performance also
 depends on whether I initialize the strings with random or empty.
 With the random initialization the new sort is ~2x faster. That's
 fedora 8, core duo, 32 bit OS, gcc 4.1.2.

Well, for me, a 3.8x (or even a 2x for that matter) of improvement is 
less amazing once you know that there is a flaw in C qsort for most of 
Linux distros around.  Neither Windows, MacOSX or certain versions of 
Linux (namely, SuSe 10.3) does reflect such a large difference in 
performance.

I'd say that newqsort that is to be included in next release of NumPy 
would be just a 10% better than using a sane implementation of C qsort.  
And, while a 10% is not as amazing than a 380%, the great news is that 
newqsort will provide first-class performance even to people having the 
flawed C qsort on their machines (which apparently are much more that I 
initially realized).

Cheers,

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-13 Thread Francesc Altet
A Tuesday 12 February 2008, Bruce Southey escrigué:
 Hi,

 I have a Opteron 248 (2.66GHz) that with gcc 4.1.0 (SUSE10.1?) that
 gives C qsort with C style compare: 0.65
 C qsort with Python style compare: 0.64
 NumPy newqsort: 0.36

That's very intersting.  In a similar configuration, but using SuSe 10.3 
(Enterprise) instead of 10.1, I don't see this factor of almost 2 of 
difference in performance (in fact, both performances, C qsort and 
NumPy newqsort, are very similar).

This seems to confirm that the GNU glibc crew has fixed the qsort 
performance very recently (i.e. I hope it is not only a fix in SuSe 
10.3 Enterprise), and this is why most of current distros are seeing 
the poor performance in qsort.

Cheers,

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-13 Thread Bruce Southey
Hi,
I added gcc 4.2 from the openSUSE 10.1 repository so I now have both
the 4.1.2 and 4.2.1 compilers installed. But still have glibc-2.4-31.1
installed. I see your result with 4.2.1 but not with 4.1.2 so I think
that there could be a difference in the compiler flags. I don't know
enough about those to help but I can test any suggestions.

$ gcc --version
gcc (GCC) 4.1.2 20070115 (prerelease) (SUSE Linux)
$ gcc -O3 sort-string-bench.c -o sort412
$ ./sort412
Benchmark with 100 strings of size 15
C qsort with C style compare: 0.63
C qsort with Python style compare: 0.64
NumPy newqsort: 0.36

$ gcc-4.2 --version
gcc-4.2 (GCC) 4.2.1 (SUSE Linux)
$ gcc-4.2 -O3 sort-string-bench.c -o sort421
$ ./sort421
Benchmark with 100 strings of size 15
C qsort with C style compare: 0.62
C qsort with Python style compare: 0.61
NumPy newqsort: 0.55

This is  the same as:
$ gcc-4.2 -O2 -finline-functions sort-string-bench.c -o sort421
$ ./sort421
Benchmark with 100 strings of size 15
C qsort with C style compare: 0.71
C qsort with Python style compare: 0.70
NumPy newqsort: 0.55

(NumPy newqsort with -O2 alone is 0.6)

For completeness, 4.1.2 using '-O2' versus '-O2 -finline-functions' is
NumPy newqsort: 0.62 vs NumPy newqsort: 0.50

Regards
Bruce


On Feb 13, 2008 4:35 AM, Francesc Altet [EMAIL PROTECTED] wrote:
 A Tuesday 12 February 2008, Bruce Southey escrigué:
  Hi,
 
  I have a Opteron 248 (2.66GHz) that with gcc 4.1.0 (SUSE10.1?) that
  gives C qsort with C style compare: 0.65
  C qsort with Python style compare: 0.64
  NumPy newqsort: 0.36

 That's very intersting.  In a similar configuration, but using SuSe 10.3
 (Enterprise) instead of 10.1, I don't see this factor of almost 2 of
 difference in performance (in fact, both performances, C qsort and
 NumPy newqsort, are very similar).

 This seems to confirm that the GNU glibc crew has fixed the qsort
 performance very recently (i.e. I hope it is not only a fix in SuSe
 10.3 Enterprise), and this is why most of current distros are seeing
 the poor performance in qsort.


 Cheers,

 --
 0,0   Francesc Altet http://www.carabos.com/
 V   V   Cárabos Coop. V.   Enjoy Data
  -
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-13 Thread Charles R Harris
On Feb 13, 2008 9:19 AM, Bruce Southey [EMAIL PROTECTED] wrote:

 Hi,
 I added gcc 4.2 from the openSUSE 10.1 repository so I now have both
 the 4.1.2 and 4.2.1 compilers installed. But still have glibc-2.4-31.1
 installed. I see your result with 4.2.1 but not with 4.1.2 so I think
 that there could be a difference in the compiler flags. I don't know
 enough about those to help but I can test any suggestions.

 $ gcc --version
 gcc (GCC) 4.1.2 20070115 (prerelease) (SUSE Linux)
 $ gcc -O3 sort-string-bench.c -o sort412
 $ ./sort412
 Benchmark with 100 strings of size 15
 C qsort with C style compare: 0.63
 C qsort with Python style compare: 0.64
 NumPy newqsort: 0.36

 $ gcc-4.2 --version
 gcc-4.2 (GCC) 4.2.1 (SUSE Linux)
 $ gcc-4.2 -O3 sort-string-bench.c -o sort421
 $ ./sort421
 Benchmark with 100 strings of size 15
 C qsort with C style compare: 0.62
 C qsort with Python style compare: 0.61
 NumPy newqsort: 0.55

 This is  the same as:
 $ gcc-4.2 -O2 -finline-functions sort-string-bench.c -o sort421
 $ ./sort421
 Benchmark with 100 strings of size 15
 C qsort with C style compare: 0.71
 C qsort with Python style compare: 0.70
 NumPy newqsort: 0.55

 (NumPy newqsort with -O2 alone is 0.6)

 For completeness, 4.1.2 using '-O2' versus '-O2 -finline-functions' is
 NumPy newqsort: 0.62 vs NumPy newqsort: 0.50


It's certainly interesting how much difference the compiler/flags make. I
suppose one more thing to add to the mix is the  -march and -mtune flags.
They didn't make much difference here, but they might for someone else. It
would also be interesting to see how a 64 bit system handles things.

/lib/libgcc_s-4.1.2
gcc 4.1.2,  -O3 -march=prescott -mtune=generic

Benchmark with 100 strings of size 15
C qsort with C style compare: 0.94
C qsort with Python style compare: 0.94
NumPy newqsort: 0.31

I suppose much also depends on the flags with which libgcc is compiled,
which in turn probably depends on the distro.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-13 Thread Charles R Harris
On Feb 13, 2008 10:56 AM, Francesc Altet [EMAIL PROTECTED] wrote:

 A Wednesday 13 February 2008, Charles R Harris escrigué:
  OK,
 
  The new quicksorts are in svn. Francesc, can you check them out?
 

 Looks good here.  However, you seem to keep using your own copy_string()
 instead of plain memcpy().  In previous benchmarks, I've seen that
 copy_string() is faster than memcpy only for small values of the length
 of the block to be copied.


Yes, I noticed that your benchmark program crossed over to using memcpy at
16 chars, and I will probably add that feature. I was being conservative to
start with.

snip


 Finally, you also will have noticed the indirect sort line in the plot.
 This is because I was curious about when this method would win a direct
 sort.  And, by looking at the plot, it seems that the crosspoint is
 around strings of 128 bytes (much more in fact that I initially
 thought), and starts to be very significant (around 40% faster) at 256
 bytes.  So perhaps it would make sense to add the possibility to choose
 the indirect method when sorting those large strings.  This, of course,
 would require more memory for the indices, but using 4 or 8 additional
 bytes (depending if we on 32-bit or 64-bit), when each string takes 200
 bytes, doesn't seem too crazy.  In any case, it would be nice to
 document this in docstrings.


It would be easy to add this feature, but for the moment I think the best
thing is to document it.

Another fairly easy change that could be made is to support strided arrays.
That might speed sorting of non-contiguous arrays and sorts on axis other
than -1. The only reason it isn't there now is that I originally wrote the
sorting routines for numarray and numarray's upper level interface passed
contiguous arrays to the sort functions.


 Be warned, I'd like to stress out that these are my figures for my _own
 laptop_.  It would be nice if you can verify all of this with other
 achitectures (your Core2 machine seems different enough).  I can run
 the benchmarks on Windows (installed in the same laptop) too.  Tell me
 if you are interested on me doing this.


Its easy enough to test if you compile from svn, just add your new copy
function and change the name in this line:

   #copy=copy_string, copy_ucs4#

to use your function instead of copy_string.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-13 Thread Francesc Altet
A Wednesday 13 February 2008, Charles R Harris escrigué:
 OK,

 The new quicksorts are in svn. Francesc, can you check them out?


Looks good here.  However, you seem to keep using your own copy_string() 
instead of plain memcpy().  In previous benchmarks, I've seen that 
copy_string() is faster than memcpy only for small values of the length 
of the block to be copied.

In order to do a better assessment of this affirmation, I've created a 
small benchmark (attached) in order to compare your copy_string against 
system memcpy when sorting array strings.  I've also come up with a new 
function (copy_string2, attached), that tries to combine the best of 
copy_string and memcpy.  Look at the attached plot in order to see how 
they behave.

In the plot, you can see how memcpy is generally faster than 
copy_string, specially for relatively large string lengths.  However, 
copy_string can be faster for small lengths (the maximum difference, a 
20%, happens around 8/10 bytes).  It may happen that you were doing 
time mesaurements whith strings of size 8, and you were driven to the 
wrong conclusion that copy_string was faster than memcpy.

In case you think that performance for small string lengths is 
important, you may want to include copy_string2, that uses one method 
or another depending on the size block to be copied.  There is of 
course a small impact in performance (one more condition test was 
introduced), but it is rather negligible.

Finally, you also will have noticed the indirect sort line in the plot.  
This is because I was curious about when this method would win a direct 
sort.  And, by looking at the plot, it seems that the crosspoint is 
around strings of 128 bytes (much more in fact that I initially 
thought), and starts to be very significant (around 40% faster) at 256 
bytes.  So perhaps it would make sense to add the possibility to choose 
the indirect method when sorting those large strings.  This, of course, 
would require more memory for the indices, but using 4 or 8 additional 
bytes (depending if we on 32-bit or 64-bit), when each string takes 200 
bytes, doesn't seem too crazy.  In any case, it would be nice to 
document this in docstrings.

Be warned, I'd like to stress out that these are my figures for my _own 
laptop_.  It would be nice if you can verify all of this with other 
achitectures (your Core2 machine seems different enough).  I can run 
the benchmarks on Windows (installed in the same laptop) too.  Tell me 
if you are interested on me doing this.

Cheers,

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -


sort-string-bench.py
Description: application/python
static void
copy_string2(char *s1, char *s2, size_t len)
{
if (len  16) {
while(len--) {
	*s1++ = *s2++;
	}
}
else {
memcpy(s1, s2, len);
}
}


ubuntu-pentium4-newqsort.pdf
Description: Adobe PDF document
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-13 Thread Charles R Harris
On Feb 13, 2008 12:37 PM, Francesc Altet [EMAIL PROTECTED] wrote:

 A Wednesday 13 February 2008, Francesc Altet escrigué:
  A Wednesday 13 February 2008, Bruce Southey escrigué:
   Hi,
   I added gcc 4.2 from the openSUSE 10.1 repository so I now have
   both the 4.1.2 and 4.2.1 compilers installed. But still have
   glibc-2.4-31.1 installed. I see your result with 4.2.1 but not with
   4.1.2 so I think that there could be a difference in the compiler
   flags. I don't know enough about those to help but I can test any
   suggestions.
  
   $ gcc --version
   gcc (GCC) 4.1.2 20070115 (prerelease) (SUSE Linux)
   $ gcc -O3 sort-string-bench.c -o sort412
   $ ./sort412
   Benchmark with 100 strings of size 15
   C qsort with C style compare: 0.63
   C qsort with Python style compare: 0.64
   NumPy newqsort: 0.36
  
   $ gcc-4.2 --version
   gcc-4.2 (GCC) 4.2.1 (SUSE Linux)
   $ gcc-4.2 -O3 sort-string-bench.c -o sort421
   $ ./sort421
   Benchmark with 100 strings of size 15
   C qsort with C style compare: 0.62
   C qsort with Python style compare: 0.61
   NumPy newqsort: 0.55
  
   This is  the same as:
   $ gcc-4.2 -O2 -finline-functions sort-string-bench.c -o sort421
   $ ./sort421
   Benchmark with 100 strings of size 15
   C qsort with C style compare: 0.71
   C qsort with Python style compare: 0.70
   NumPy newqsort: 0.55
  
   (NumPy newqsort with -O2 alone is 0.6)
  
   For completeness, 4.1.2 using '-O2' versus '-O2 -finline-functions'
   is NumPy newqsort: 0.62 vs NumPy newqsort: 0.50
 
  That's really interesting.  Let me remember my figures for our
  Opteron:
 
  3) SuSe LE 10.3 (gcc 4.2.1, -O3, AMD Opteron @ 2 GHz)
  C qsort with C style compare: 0.64
  C qsort with Python style compare: 0.60
  NumPy newqsort: 0.59
 

 Just an addedum.  I've compiled the benchmark using gcc 4.1.2 using our
 Opteron machine.  Here are the results:

 SuSe LE 10.3 (gcc 4.1.2, -O3, AMD Opteron @ 2 GHz)
 Benchmark with 100 strings of size 15
 C qsort with C style compare: 0.62
 C qsort with Python style compare: 0.61
 NumPy newqsort: 0.38

 So, I'm getting a 55% more of performance than by using gcc 4.2.1 (!).
 Also, I've installed gcc 4.2.1 on my laptop and here are the results:

 Ubuntu 7.10 (gcc 4.2.1, -O3, Intel Pentium 4 @ 2 GHz)
 Benchmark with 100 strings of size 15
 C qsort with C style compare: 2.45
 C qsort with Python style compare: 2.42
 NumPy newqsort: 0.63

 While using gcc 4.1.2, I get:

 Ubuntu 7.10 (gcc 4.1.3, -O3, Intel Pentium 4 @ 2 GHz)
 Benchmark with 100 strings of size 15
 C qsort with C style compare: 2.45
 C qsort with Python style compare: 2.44
 NumPy newqsort: 0.65

 So, in this case (32-bit platform) gcc 4.2.1 seems to perform similarly
 to 4.1.2.

 So, I'd say that the guilty is the gcc 4.2.1, 64-bit (or at very least,
 AMD Opteron architecture) and that newqsort performs really well in
 general (provided that the compiler can find the best path for
 optimizing its code).  Anyone using a 64-bit platform and having both
 gcc 4.1.2 and 4.2.1 installed can confirm this?

 Also, MSVC 7.1 32-bit (with opt level /Ox) doesn't seem to find such a
 best path (the benchmark for newqsort takes 0.92s using MSVC 7.1, while
 gcc 4.1.2 takes 0.65s using the same machine, a 40% faster).  I don't
 know whether newer versions of MSVC will do better or not, though.


Now we need someone to try ICC ;)

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-13 Thread Scott Ransom
On Wednesday 13 February 2008 02:37:37 pm Francesc Altet wrote:
 So, I'd say that the guilty is the gcc 4.2.1, 64-bit (or at very
 least, AMD Opteron architecture) and that newqsort performs really
 well in general (provided that the compiler can find the best path
 for optimizing its code).  Anyone using a 64-bit platform and having
 both gcc 4.1.2 and 4.2.1 installed can confirm this?

Here are results from a 64-bit Debian system using a Core2 Duo 2.66 GHz 
processor.

I used gcc 3.4.6, 4.1.3, 4.2.3, and 4.3.0 (20080202 experimental) 
with -O2 and -O3.

Summary:  There is a big difference between -02 and -O3.  gcc-4.2 seems 
slightly better than the other gccs.  And the newqsort is a lot faster 
(always) than the libc version.

Scott

eiger:/data1$ ./sort346_O2
Benchmark with 100 strings of size 15
C qsort with C style compare: 0.55
C qsort with Python style compare: 0.53
NumPy newqsort: 0.45

eiger:/data1$ ./sort346_O3
Benchmark with 100 strings of size 15
C qsort with C style compare: 0.55
C qsort with Python style compare: 0.52
NumPy newqsort: 0.35

eiger:/data1$ ./sort413_O2
Benchmark with 100 strings of size 15
C qsort with C style compare: 0.56
C qsort with Python style compare: 0.53
NumPy newqsort: 0.42

eiger:/data1$ ./sort413_O3
Benchmark with 100 strings of size 15
C qsort with C style compare: 0.54
C qsort with Python style compare: 0.50
NumPy newqsort: 0.28

eiger:/data1$ ./sort423_O2
Benchmark with 100 strings of size 15
C qsort with C style compare: 0.56
C qsort with Python style compare: 0.53
NumPy newqsort: 0.39

eiger:/data1$ ./sort423_O3
Benchmark with 100 strings of size 15
C qsort with C style compare: 0.53
C qsort with Python style compare: 0.50
NumPy newqsort: 0.27

eiger:/data1$ ./sort43_O2
Benchmark with 100 strings of size 15
C qsort with C style compare: 0.55
C qsort with Python style compare: 0.53
NumPy newqsort: 0.34

eiger:/data1$ ./sort43_O3
Benchmark with 100 strings of size 15
C qsort with C style compare: 0.53
C qsort with Python style compare: 0.51
NumPy newqsort: 0.33

-- 
Scott M. RansomAddress:  NRAO
Phone:  (434) 296-0320   520 Edgemont Rd.
email:  [EMAIL PROTECTED] Charlottesville, VA 22903 USA
GPG Fingerprint: 06A9 9553 78BE 16DB 407B  FFCA 9BFA B6FF FFD3 2989
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-12 Thread Francesc Altet
A Monday 11 February 2008, Charles R Harris escrigué:
 On Feb 11, 2008 1:15 PM, Francesc Altet [EMAIL PROTECTED] wrote:
  Here are the results of running it in several platforms:
 
  1) My laptop: Ubuntu 7.1 (gcc 4.1.3, Pentium 4 @ 2 GHz)
  Benchmark with 100 strings of size 15
  C qsort with C style compare: 2.45
  C qsort with Python style compare: 2.44
  NumPy newqsort: 0.65

 Wow, what a difference.

Yeah.  This is why I got so excited initially.  It's unfortunate that 
most of the speed-up in newqsort in this case is probably due to a 
possible flaw in qsort implementation for the combination 
Ubuntu/Pentium4.  On the positive side, it is nice to see that other 
distros/processors have a decent performance on system qsort ;-)

  * I'd definitely keep memcpy by default.  From my timings, it looks
  like the best option for all platforms.

 OK. Was that just for the copies, or was it for the swaps also? I ran
 a version of swap using memcpy on my machine and the sort was about
 half as fast for 8 character strings.

No, only for the copies.  For the swaps, this memcpy-based version:

#define swap_string(s1, s2, len) { \
memcpy((vp), (s2), (len));  \
memcpy((s2), (s1), (len));  \
memcpy((s1), (vp), (len));  \
  }

performs extremely bad on my systems:

Pentium4/Ubuntu 7.10:
* newqsort with the loop version of swap_string:  0.65 s
* newqsort with the memcpy version of swap_string: 9.14 s

Opteron/SuSe LE 10.3:
* newqsort with the loop version of swap_string:  0.59 s
* newqsort with the memcpy version of swap_string: 8.71 s

So, it seems that the nice newqsort performance is extremely dependent 
on the loop version of swap_string.

  I hope the benchmark will behave well in your platform too (i.e.
  newqsort will perform the best ;)

 I'll check it out when I get home. As I say, it was running about 10%
 slower on my machine, but if it does better on most platforms it is
 probably the way to go. We can always change it in the future when
 everyone is running on quantum computers.

Quantum computers?  Oh, I can't wait for mine ;-)

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-12 Thread Bruce Southey
Hi,

I have a Opteron 248 (2.66GHz) that with gcc 4.1.0 (SUSE10.1?) that gives
C qsort with C style compare: 0.65
C qsort with Python style compare: 0.64
NumPy newqsort: 0.36

I did notice that -O3 was essential to get the performance gain as -O2 gave:
C qsort with C style compare: 0.69
C qsort with Python style compare: 0.70
NumPy newqsort: 0.61

Bruce


On Feb 12, 2008 10:07 AM, Francesc Altet [EMAIL PROTECTED] wrote:
 A Monday 11 February 2008, Charles R Harris escrigué:
  I'll check it out when I get home. As I say, it was running about 10%
  slower on my machine, but if it does better on most platforms it is
  probably the way to go. We can always change it in the future when
  everyone is running on quantum computers.

 We've done some testing on newqsort in several computers in our company.
 Here are the results for ordering a list with 1 million of strings of
 length 15 filled with random information (using C rand()):

 1) Ubuntu 7.1 (gcc 4.1.3, -O3, Intel Pentium 4 @ 2 GHz)
 C qsort with C style compare: 2.45
 C qsort with Python style compare: 2.44
 NumPy newqsort: 0.65

 2) Windows XP (SP2) (MSVC 7.1, /Ox, Intel Pentium 4 @ 2 GHz)
 C qsort with C style compare: 0.971000
 C qsort with Python style compare: 0.962000
 NumPy newqsort: 0.921000

 3) SuSe LE 10.3 (gcc 4.2.1, -O3, AMD Opteron @ 2 GHz)
 C qsort with C style compare: 0.64
 C qsort with Python style compare: 0.60
 NumPy newqsort: 0.59

 4) Debian 4.2.2 (lenny) (gcc 4.2.3, -O3, Intel Pentium 4 @ 3.2 GHz)
 C qsort with C style compare: 1.77
 C qsort with Python style compare: 1.75
 NumPy newqsort: 0.44

 5) Mandriva 2008.0 (gcc 4.2.2, -O3, Intel Core2 Duo @ 1.5 GHz)
 C qsort with C style compare: 1.59
 C qsort with Python style compare: 1.55
 NumPy newqsort: 0.51

 6) Ubuntu 7.1 (gcc 4.1.3, -O3, Intel Pentium 4 @ 2.5 GHz)
 C qsort with C style compare: 1.89
 C qsort with Python style compare: 1.90
 NumPy newqsort: 0.50

 7) Ubuntu 7.1 (gcc 4.1.2, -O3, PowerPC 3 @ 1.3 GHz)
 C qsort with C style compare: 3.03
 C qsort with Python style compare: 2.97
 NumPy newqsort: 1.04

 8) MacOSX 10.4 (Tiger) (gcc 4.0.1, -O3, PowerPC 3 @ 1.3 GHz)
 C qsort with C style compare: 1.56
 C qsort with Python style compare: 1.51
 NumPy newqsort: 1.22

 All benchmarks have been run using the attached benchmark (if anybody
 else wants to join the fiesta, please report back your feedback).

 Summarizing, one can say a couple of things:

 * Recent Debian distros and derivatives (Ubuntu) as well as Mandriva are
 suffering from a innefficient system qsort (at least the implementation
 for strings).  SuSe Linux Enterprise 10.3 seems to have solved this.
 And Windows XP (SP2) and MacOSX (Tiger) looks like they have a
 relatively efficient implementation of qsort.

 * The newqsort performs the best on all the platforms we have checked
 (ranging from a 5% of improvement on Opteron/SuSe, up to 3.8x with some
 Pentium4/Ubuntu systems).

 All in all, I'd also say that newqsort would be a good candidate to be
 put into NumPy.

 Cheers,

 --
 0,0   Francesc Altet http://www.carabos.com/
 V   V   Cárabos Coop. V.   Enjoy Data
  -

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-12 Thread Charles R Harris
On Feb 12, 2008 9:07 AM, Francesc Altet [EMAIL PROTECTED] wrote:

 A Monday 11 February 2008, Charles R Harris escrigué:
  I'll check it out when I get home. As I say, it was running about 10%
  slower on my machine, but if it does better on most platforms it is
  probably the way to go. We can always change it in the future when
  everyone is running on quantum computers.

 We've done some testing on newqsort in several computers in our company.
 Here are the results for ordering a list with 1 million of strings of
 length 15 filled with random information (using C rand()):

 1) Ubuntu 7.1 (gcc 4.1.3, -O3, Intel Pentium 4 @ 2 GHz)
 C qsort with C style compare: 2.45
 C qsort with Python style compare: 2.44
 NumPy newqsort: 0.65

 2) Windows XP (SP2) (MSVC 7.1, /Ox, Intel Pentium 4 @ 2 GHz)
 C qsort with C style compare: 0.971000
 C qsort with Python style compare: 0.962000
 NumPy newqsort: 0.921000

 3) SuSe LE 10.3 (gcc 4.2.1, -O3, AMD Opteron @ 2 GHz)
 C qsort with C style compare: 0.64
 C qsort with Python style compare: 0.60
 NumPy newqsort: 0.59

 4) Debian 4.2.2 (lenny) (gcc 4.2.3, -O3, Intel Pentium 4 @ 3.2 GHz)
 C qsort with C style compare: 1.77
 C qsort with Python style compare: 1.75
 NumPy newqsort: 0.44

 5) Mandriva 2008.0 (gcc 4.2.2, -O3, Intel Core2 Duo @ 1.5 GHz)
 C qsort with C style compare: 1.59
 C qsort with Python style compare: 1.55
 NumPy newqsort: 0.51

 6) Ubuntu 7.1 (gcc 4.1.3, -O3, Intel Pentium 4 @ 2.5 GHz)
 C qsort with C style compare: 1.89
 C qsort with Python style compare: 1.90
 NumPy newqsort: 0.50

 7) Ubuntu 7.1 (gcc 4.1.2, -O3, PowerPC 3 @ 1.3 GHz)
 C qsort with C style compare: 3.03
 C qsort with Python style compare: 2.97
 NumPy newqsort: 1.04

 8) MacOSX 10.4 (Tiger) (gcc 4.0.1, -O3, PowerPC 3 @ 1.3 GHz)
 C qsort with C style compare: 1.56
 C qsort with Python style compare: 1.51
 NumPy newqsort: 1.22

 All benchmarks have been run using the attached benchmark (if anybody
 else wants to join the fiesta, please report back your feedback).

 Summarizing, one can say a couple of things:

 * Recent Debian distros and derivatives (Ubuntu) as well as Mandriva are
 suffering from a innefficient system qsort (at least the implementation
 for strings).  SuSe Linux Enterprise 10.3 seems to have solved this.
 And Windows XP (SP2) and MacOSX (Tiger) looks like they have a
 relatively efficient implementation of qsort.

 * The newqsort performs the best on all the platforms we have checked
 (ranging from a 5% of improvement on Opteron/SuSe, up to 3.8x with some
 Pentium4/Ubuntu systems).


The 3.8 is amazing, isn't it? I've found that the performance also depends
on whether I initialize the strings with random or empty. With the random
initialization the new sort is ~2x faster. That's fedora 8, core duo, 32 bit
OS, gcc 4.1.2.


 All in all, I'd also say that newqsort would be a good candidate to be
 put into NumPy.


I've merged some sorting tests preparatory to merging the new sorts. There
is a release coming up this weekend, I don't know if it is tagged yet, but
in any case I plan to merge the new sorts soon. Please help with the testing
when I do. Now it's off to paying work.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-12 Thread Charles R Harris
OK,

The new quicksorts are in svn. Francesc, can you check them out?

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-11 Thread Francesc Altet
A Monday 11 February 2008, Charles R Harris escrigué:
 I've attached my working _sortmodule.c.src file so you can fool with
 these different changes on your machines also. This is on top of
 current svn.

Ok.  In order to compare pears with pears, I've decided to create a 
standalone program in C (attached), based on your version (yes, it is 
almost the same that the one that I came up with).  This also allows to 
run it quickly in as many platforms as possible.  The compiler throws 
some warnings, but they are not important (I think).

Here are the results of running it in several platforms:

1) My laptop: Ubuntu 7.1 (gcc 4.1.3, Pentium 4 @ 2 GHz)
Benchmark with 100 strings of size 15
C qsort with C style compare: 2.45
C qsort with Python style compare: 2.44
NumPy newqsort: 0.65

2) My laptop: Windows XP (MSVC 7.1, Pentium 4 @ 2 GHz)
Benchmark with 100 strings of size 15
C qsort with C style compare: 0.971000
C qsort with Python style compare: 0.962000
NumPy newqsort: 0.921000

3) An Opteron server:  SuSe 10.1 (gcc 4.2.1, Opteron @ 2 GHz)
Benchmark with 100 strings of size 15
C qsort with C style compare: 0.64
C qsort with Python style compare: 0.60
NumPy newqsort: 0.59

Some of the conclusions that can be drawn:

* C qsort performs pretty badly on my Pentium4 laptop with Ubuntu
* C qsort on Win on my laptop performs very similar to newqsort
* newqsort performs much better on my Ubuntu Linux than in Windows
* On Opteron, C qsort and newqsort do perform very similarly
* and most importantly, newqsort runs faster in *all* platforms

So, provided the last conclusion, I think it is safe to check newqsort 
in NumPy (unless something catastrofic might occur on other platforms).

Finally, a couple of small things:

* MSVC doesn't swallow the inline qualifier.  So we should remove it 
and hope that most of NumPy installations will be compiled -O3 at 
least.

* I'd definitely keep memcpy by default.  From my timings, it looks like 
the best option for all platforms.

I hope the benchmark will behave well in your platform too (i.e. 
newqsort will perform the best ;)

Cheers,

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
#include stdlib.h
#include stdio.h
#include time.h


#define NUM 100
#define LEN 15

/* This cannot be inlined by MSVC compiler */
/* static int */
/* compare_stringC(const void *s1, const void *s2) */
/* { */
/*   return strncmp(s1, s2, LEN); */
/* } */

static int
compare_stringC(const void *s1, const void *s2)
{
const unsigned char *c1 = (unsigned char *)s1;
const unsigned char *c2 = (unsigned char *)s2;
size_t i;

for(i = 0; i  LEN; ++i) {
if (c1[i] != c2[i]) {
return (c1[i]  c2[i]) ? 1 : -1;
}
	if (c1[i] == 0) return 0;
}
return 0;
}


static int
compare_stringP(const void *s1, const void *s2)
{
const unsigned char *c1 = (unsigned char *)s1;
const unsigned char *c2 = (unsigned char *)s2;
size_t i;

for(i = 0; i  LEN; ++i) {
if (c1[i] != c2[i]) {
return (c1[i]  c2[i]) ? 1 : -1;
}
}
return 0;
}



#define PYA_QS_STACK 100
#define SMALL_QUICKSORT 15

static int
compare_string(char *s1, char *s2, size_t len)
{
const unsigned char *c1 = (unsigned char *)s1;
const unsigned char *c2 = (unsigned char *)s2;
size_t i;

for(i = 0; i  len; ++i) {
if (c1[i] != c2[i]) {
return (c1[i]  c2[i]) ? 1 : -1;
}
}
return 0;
}

#define STRING_LT(pa, pb, len) (compare_string(pa, pb, len)  0)

static void
swap_string(char *s1, char *s2, size_t len)
{
while(len--) {
const char t = *s1;
*s1++ = *s2;
*s2++ = t;
}
}


#define opt_memcpy(a,b,n) memcpy((a),(b),(n))

static void
copy_string(char *s1, char *s2, size_t len)
{
while(len--) {
*s1++ = *s2++;
}
}

/* This seems to work faster on my laptop, but YMMV */
static void
opt_memcpy2(void *a, const void *b, size_t n) {
  size_t i;
  if (n16)
	for (i=0; in; i++)
	  ((char *)a)[i] = ((char *)b)[i];
  else
	memcpy(a, b, n);
}


static int
newqsort(char *start, size_t num, size_t len)
{
char *vp = malloc(len);
char *pl = start;
char *pr = start + (num - 1)*len;
char *stack[PYA_QS_STACK], **sptr = stack, *pm, *pi, *pj, *pk;

for(;;) {
while ((pr - pl)  5*len) {
/* quicksort partition */
pm = pl + (((pr - pl)/len)  1)*len;
if (STRING_LT(pm, pl, len)) swap_string(pm, pl, len);
if (STRING_LT(pr, pm, len)) swap_string(pr, pm, len);
if (STRING_LT(pm, pl, len)) swap_string(pm, pl, len);
opt_memcpy(vp, pm, len);
pi = pl;
pj = pr - len;
swap_string(pm, pj, len);
for(;;) {
do pi += len; while (STRING_LT(pi, vp, len));
do pj -= len; while (STRING_LT(vp, pj, len));
if (pi = pj)  break;

Re: [Numpy-discussion] String sort

2008-02-11 Thread Charles R Harris
On Feb 11, 2008 1:15 PM, Francesc Altet [EMAIL PROTECTED] wrote:

 A Monday 11 February 2008, Charles R Harris escrigué:
  I've attached my working _sortmodule.c.src file so you can fool with
  these different changes on your machines also. This is on top of
  current svn.

 Ok.  In order to compare pears with pears, I've decided to create a
 standalone program in C (attached), based on your version (yes, it is
 almost the same that the one that I came up with).  This also allows to
 run it quickly in as many platforms as possible.  The compiler throws
 some warnings, but they are not important (I think).

 Here are the results of running it in several platforms:

 1) My laptop: Ubuntu 7.1 (gcc 4.1.3, Pentium 4 @ 2 GHz)
 Benchmark with 100 strings of size 15
 C qsort with C style compare: 2.45
 C qsort with Python style compare: 2.44
 NumPy newqsort: 0.65


Wow, what a difference.



 2) My laptop: Windows XP (MSVC 7.1, Pentium 4 @ 2 GHz)
 Benchmark with 100 strings of size 15
 C qsort with C style compare: 0.971000
 C qsort with Python style compare: 0.962000
 NumPy newqsort: 0.921000

 3) An Opteron server:  SuSe 10.1 (gcc 4.2.1, Opteron @ 2 GHz)
 Benchmark with 100 strings of size 15
 C qsort with C style compare: 0.64
 C qsort with Python style compare: 0.60
 NumPy newqsort: 0.59

 Some of the conclusions that can be drawn:

 * C qsort performs pretty badly on my Pentium4 laptop with Ubuntu
 * C qsort on Win on my laptop performs very similar to newqsort
 * newqsort performs much better on my Ubuntu Linux than in Windows
 * On Opteron, C qsort and newqsort do perform very similarly
 * and most importantly, newqsort runs faster in *all* platforms

 So, provided the last conclusion, I think it is safe to check newqsort
 in NumPy (unless something catastrofic might occur on other platforms).

 Finally, a couple of small things:

 * MSVC doesn't swallow the inline qualifier.  So we should remove it
 and hope that most of NumPy installations will be compiled -O3 at
 least.


I was afraid of that. The inline keyword is a fairly new standard; gcc has
had it for a while but the older versions of MSVC didn't. I don't know if
the newer MSVC versions do. IIRC, there was another way to get MSVC to
inline. Of course, we could go to C++ :0)


 * I'd definitely keep memcpy by default.  From my timings, it looks like
 the best option for all platforms.


OK. Was that just for the copies, or was it for the swaps also? I ran a
version of swap using memcpy on my machine and the sort was about half as
fast for 8 character strings.



 I hope the benchmark will behave well in your platform too (i.e.
 newqsort will perform the best ;)


I'll check it out when I get home. As I say, it was running about 10% slower
on my machine, but if it does better on most platforms it is probably the
way to go. We can always change it in the future when everyone is running on
quantum computers.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-11 Thread Charles R Harris
On Feb 11, 2008 4:06 AM, Francesc Altet [EMAIL PROTECTED] wrote:

 A Monday 11 February 2008, Francesc Altet escrigué:
  A Monday 11 February 2008, Charles R Harris escrigué:

 Mmm, comparing my new strncmp and the one that you have implemented in
 SVN, I've found a difference that can account for part of the
 difference in performances.  With your version of strncmp in SVN
 (compare_string), these are my timings with the Opteron server:

snip


 In [17]: np.random.seed(1)

 In [18]: a = np.random.rand(1).astype('S8')

 In [19]: %timeit a.copy().sort()
 100 loops, best of 3: 3.86 ms per loop

 In [20]: %timeit newqsort(a.copy())
 100 loops, best of 3: 3.44 ms per loop

 which gives times a 5% worse.  Try to use my version and tell me if it
 does better:

 static int inline
 opt_strncmp(char *a, char *b, size_t n)
 {
size_t i;
unsigned char c, d;
for (i = 0; i  n; i++) {
c = a[i]; d = b[i];
if (c != d) return c - d;
}
return 0;
 }


I didn't notice any speed difference. And while returning the difference of
two unsigned numbers should work with modular arithmetic when it is cast to
integer, I thought the explicit return based on a compare was clearer and
safer. Comparisons always work.

I've attached my working _sortmodule.c.src file so you can fool with these
different changes on your machines also. This is on top of current svn.

Chuck


_sortmodule.c.src
Description: WAIS Source
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-11 Thread Charles R Harris
On Feb 11, 2008 2:58 AM, Francesc Altet [EMAIL PROTECTED] wrote:

 A Monday 11 February 2008, Charles R Harris escrigué:
  On Feb 8, 2008 5:29 AM, Francesc Altet [EMAIL PROTECTED] wrote:
   Hi,
  
   I'm a bit confused that the sort method of a string character
   doesn't
  
   allow a mergesort:
s = numpy.empty(10, S10)
s.sort(kind=merge)
  
   TypeError: desired sort not supported for this type
  
   However, by looking at the numpy sources, it seems that the only
   implemented method for sorting array strings is merge (I presume
   because it is stable).  So, perhaps the message above should be
   fixed.
 
  The only available method is the default quicksort. The mergesort is
  for argsort and was put in for lexsort to use.

 That's good to know.  However, I'm curious about where it is the
 specific quicksort implementation for strings/unicode.  I've had a look
 at _sortmodule.c.src, but I can only find a quicksort implementation
 for:


The default is the C qsort, it is called from PyArray_Sort in
multiarraymodule.c. The type specific sorts, when they exist, are also
called from there through _new_sort. To see what type specific sorts are
registered, look at the end of _sortmodule.c.src. You can write a new sort,
and if it isn't registered it won't be used. So commenting out the
registration is a good way to get back to the default.




 The version you are testing is your own or the one that I provided?
 Here are the timings for my laptop:


They turned out to be essentially identical, except I used len instead of ss
;) I also used inlined functions for the copy and swap as they are safer
with the arguments. Comparison with the macro versions showed no difference
there.



 In [55]: a = np.random.rand(1).astype('S8')

 In [56]: %timeit a.copy().sort()
 100 loops, best of 3: 3.82 ms per loop

 In [57]: %timeit newqsort(a.copy())
 100 loops, best of 3: 3.29 ms per loop

 Here, the difference in performance has been reduced to a mere 15%
 (still favouring newqsort).  So, with this, it seems like the
 performance of the original sorting in NumPy only suffers a lot when
 running in old processors (eg. Pentium 4), while the performance is
 reasonable with newer ones (Opteron).


It could also depend on the C library that comes with the compiler. I'm
running on a E6600, but numpy compiles everything for the i386, which might
make a difference also.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-11 Thread Francesc Altet
A Monday 11 February 2008, Francesc Altet escrigué:
 A Monday 11 February 2008, Charles R Harris escrigué:
  That's with the current sort(kind='q') in svn, which uses the new
  string compare function but is otherwise the old default quicksort.
  The new string specific version of quicksort I'm testing is
  actually a bit slower than that. Both versions correctly sort the
  array. So I'm going to continue to experiment a bit until I see
  what is going on.

 The version you are testing is your own or the one that I provided?
 Here are the timings for my laptop:

 In [32]: a = np.random.rand(1).astype('S8')

 In [33]: %timeit a.copy().sort()   # original sort in NumPy
 10 loops, best of 3: 16.8 ms per loop

 In [34]: %timeit newqsort(a.copy())  # My own qsort implementation
 100 loops, best of 3: 4.29 ms per loop

 (I'm using a random string array here mainly because I use the sort
 in my system NumPy, with the old string compare.  However, as all the
 contents in strings are not NULL chars, the performance should be
 comparable, bar a few percent of improvement).

 So, my newqsort still seems to run almost 4x faster than the one in
 NumPy (you know, using the old string compare).

 However, when using a server with an Opteron processor, the view is
 much different:

 In [55]: a = np.random.rand(1).astype('S8')

 In [56]: %timeit a.copy().sort()
 100 loops, best of 3: 3.82 ms per loop

 In [57]: %timeit newqsort(a.copy())
 100 loops, best of 3: 3.29 ms per loop

 Here, the difference in performance has been reduced to a mere 15%
 (still favouring newqsort).  So, with this, it seems like the
 performance of the original sorting in NumPy only suffers a lot when
 running in old processors (eg. Pentium 4), while the performance is
 reasonable with newer ones (Opteron).  On its hand, newqsort seems to
 perform reasonably well in both.  I don't know what exactly is the
 reason for this (I don't know where it is the code for the original
 quicksort for strings, so I can't do a visual comparison), but it
 would be great if we can discover it!

Mmm, comparing my new strncmp and the one that you have implemented in 
SVN, I've found a difference that can account for part of the 
difference in performances.  With your version of strncmp in SVN 
(compare_string), these are my timings with the Opteron server:

In [17]: np.random.seed(1)

In [18]: a = np.random.rand(1).astype('S8')

In [19]: %timeit a.copy().sort()
100 loops, best of 3: 3.86 ms per loop

In [20]: %timeit newqsort(a.copy())
100 loops, best of 3: 3.44 ms per loop

which gives times a 5% worse.  Try to use my version and tell me if it 
does better:

static int inline
opt_strncmp(char *a, char *b, size_t n)
{
size_t i;
unsigned char c, d;
for (i = 0; i  n; i++) {
c = a[i]; d = b[i];
if (c != d) return c - d;
}
return 0;
}

Although a 5% is maybe too little improvement :-/

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-11 Thread Francesc Altet
A Monday 11 February 2008, Charles R Harris escrigué:
 On Feb 8, 2008 5:29 AM, Francesc Altet [EMAIL PROTECTED] wrote:
  Hi,
 
  I'm a bit confused that the sort method of a string character
  doesn't
 
  allow a mergesort:
   s = numpy.empty(10, S10)
   s.sort(kind=merge)
 
  TypeError: desired sort not supported for this type
 
  However, by looking at the numpy sources, it seems that the only
  implemented method for sorting array strings is merge (I presume
  because it is stable).  So, perhaps the message above should be
  fixed.

 The only available method is the default quicksort. The mergesort is
 for argsort and was put in for lexsort to use.

That's good to know.  However, I'm curious about where it is the 
specific quicksort implementation for strings/unicode.  I've had a look 
at _sortmodule.c.src, but I can only find a quicksort implementation 
for:

/**begin repeat
#TYPE=BOOL,BYTE,UBYTE,SHORT,USHORT,INT,UINT,LONG,ULONG,LONGLONG,ULONGLONG,FLOAT,DOUBLE,LONGDOUBLE,CFLOAT,CDOUBLE,CLONGDOUBLE#
**/

Where are the STRING/UNICODE versions?

  Also, in the context of my work in indexing, and because of the
  slowness of the current implementation in NumPy, I've ended with an
  implementation of the quicksort method for 1-D array strings.  For
  moderately large arrays, it is about 2.5x-3x faster than the
  (supposedly) mergesort version in NumPy, not only due to the
  quicksort, but also because I've implemented a couple of macros for
  efficient string swapping and copy.  If this is of interest for
  NumPy developers, tell me and I will provide the code.

 I've now got a string/ucs4 specific argsort(kind='q'), the string
 version of which is about 40% faster than the old default and about
 10% faster than the mergesort version, but the string/ucs4 specific
 versions of sort aren't yet fairing as well. I'm timing things with

 In [1]: import timeit

 In [2]: t =
 timeit.Timer(np.fromstring(np.empty(1).tostring(),dtype='|S8').s
ort(kind='q'),import numpy as np)

 In [3]: t.repeat(3,100)
 Out[3]: [0.22127485275268555, 0.21282196044921875,
 0.21273088455200195]

 That's with the current sort(kind='q') in svn, which uses the new
 string compare function but is otherwise the old default quicksort.
 The new string specific version of quicksort I'm testing is actually
 a bit slower than that. Both versions correctly sort the array. So
 I'm going to continue to experiment a bit until I see what is going
 on.

The version you are testing is your own or the one that I provided?  
Here are the timings for my laptop:

In [32]: a = np.random.rand(1).astype('S8')

In [33]: %timeit a.copy().sort()   # original sort in NumPy
10 loops, best of 3: 16.8 ms per loop

In [34]: %timeit newqsort(a.copy())  # My own qsort implementation
100 loops, best of 3: 4.29 ms per loop

(I'm using a random string array here mainly because I use the sort in 
my system NumPy, with the old string compare.  However, as all the 
contents in strings are not NULL chars, the performance should be 
comparable, bar a few percent of improvement).

So, my newqsort still seems to run almost 4x faster than the one in 
NumPy (you know, using the old string compare).

However, when using a server with an Opteron processor, the view is much 
different:

In [55]: a = np.random.rand(1).astype('S8')

In [56]: %timeit a.copy().sort()
100 loops, best of 3: 3.82 ms per loop

In [57]: %timeit newqsort(a.copy())
100 loops, best of 3: 3.29 ms per loop

Here, the difference in performance has been reduced to a mere 15% 
(still favouring newqsort).  So, with this, it seems like the 
performance of the original sorting in NumPy only suffers a lot when 
running in old processors (eg. Pentium 4), while the performance is 
reasonable with newer ones (Opteron).  On its hand, newqsort seems to 
perform reasonably well in both.  I don't know what exactly is the 
reason for this (I don't know where it is the code for the original 
quicksort for strings, so I can't do a visual comparison), but it would 
be great if we can discover it!

Cheers,

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-10 Thread Charles R Harris
On Feb 8, 2008 5:29 AM, Francesc Altet [EMAIL PROTECTED] wrote:

 Hi,

 I'm a bit confused that the sort method of a string character doesn't
 allow a mergesort:

  s = numpy.empty(10, S10)
  s.sort(kind=merge)
 TypeError: desired sort not supported for this type

 However, by looking at the numpy sources, it seems that the only
 implemented method for sorting array strings is merge (I presume
 because it is stable).  So, perhaps the message above should be fixed.


The only available method is the default quicksort. The mergesort is for
argsort and was put in for lexsort to use.



 Also, in the context of my work in indexing, and because of the slowness
 of the current implementation in NumPy, I've ended with an
 implementation of the quicksort method for 1-D array strings.  For
 moderately large arrays, it is about 2.5x-3x faster than the
 (supposedly) mergesort version in NumPy, not only due to the quicksort,
 but also because I've implemented a couple of macros for efficient
 string swapping and copy.  If this is of interest for NumPy developers,
 tell me and I will provide the code.


I've now got a string/ucs4 specific argsort(kind='q'), the string version of
which is about 40% faster than the old default and about 10% faster than the
mergesort version, but the string/ucs4 specific versions of sort aren't yet
fairing as well. I'm timing things with

In [1]: import timeit

In [2]: t = 
timeit.Timer(np.fromstring(np.empty(1).tostring(),dtype='|S8').sort(kind='q'),import
numpy as np)

In [3]: t.repeat(3,100)
Out[3]: [0.22127485275268555, 0.21282196044921875, 0.21273088455200195]

That's with the current sort(kind='q') in svn, which uses the new string
compare function but is otherwise the old default quicksort. The new string
specific version of quicksort I'm testing is actually a bit slower than
that. Both versions correctly sort the array. So I'm going to continue to
experiment a bit until I see what is going on.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-09 Thread Francesc Altet
A Friday 08 February 2008, Charles R Harris escrigué:
 On Feb 8, 2008 8:58 AM, Francesc Altet [EMAIL PROTECTED] wrote:
  A Friday 08 February 2008, Charles R Harris escrigué:
Also, in the context of my work in indexing, and because of the
slowness of the current implementation in NumPy, I've ended
with an implementation of the quicksort method for 1-D array
strings.  For moderately large arrays, it is about 2.5x-3x
faster than the (supposedly) mergesort version in NumPy, not
only due to the quicksort, but also because I've implemented a
couple of macros for efficient string swapping and copy.  If
this is of interest for NumPy developers, tell me and I will
provide the code.
  
   I have some code for this too and was going to merge it. Send
   yours along and I'll get to it this weekend.
 
  Ok, great.  I'm attaching it.  Tell me if you need some
  clarification on the code.

 I ran a few timing tests. On my machine strncmp is about 100x faster
 than opt_strncmp, but  sSWAP (with some fixes), is about 10x faster
 then using the memcpy in a recent compiler. Does this match with your
 experience.

Well, I've run some more exhaustive tests on my laptop (Pentium 4 @ 2 
GHz, Ubuntu 7.10, gcc 4.1.3, using -O3 optlevel) with the next sa1 
array:

numpy.random.seed(1)
nelem = 1
a = numpy.random.rand(nelem)
sa1 = a.astype('S16')

And I've chosen the next benchmark:

/* start:  the start of data for sa1 array
   ss:  the length of the string type (16)
   num: the number of elements in sa1 (1)
 */
int sort_S(char *start, int ss, npy_intp num)
{
  char *pl = start;
  int a = 0;
  npy_intp i, j;

  for (i=0; inum; i++)
for (j=0; jnum; j++)
  a += opt_strncmp(pl+i*ss, pl+j*ss, ss);

  return a;
}

So, when I choose the next implementation of opt_strncmp:

static int inline opt_strncmp1(char *a, char *b, intp n) {
  intp i;
  for (i=0; in; i++) {
 if (a[i]  b[i]) return i+1;
 if (a[i]  b[i]) return -(i+1);
  }
  return 0;
}

I get a time of 1.70 s.  When using the next implementation:

static int inline opt_strncmp2(char *a, char *b, intp n) {
  intp i;
  for (i=0; in; i++) {
if (a[i] != b[i])
  return (((unsigned char *)a)[i] - ((unsigned char *)b)[i]);
  }
  return 0;
}

I get a time of 1.24 s.  The original strncmp gives me 2.05 seconds with 
this setup.  So, in short and using my laptop:

original strncmp: 2.05 s
opt_stncmp1:  1.70 s ; speed-up: 20%
opt_stncmp2:  1.24 s ; speed-up: 65%

Just in case, I've also run the benchmark in an Opteron server (2 GHz, 
SuSe 10.1, gcc 4.2.1, using -O3 optlevel).  Here are the results:

original strncmp: 1.25 s
opt_stncmp1:  1.61 s ; speed-up: -30%
opt_stncmp2:  1.03 s ; speed-up:  20%

So, I was wrong: opt_stncmp2 is quite more efficient than opt_stncmp1, 
and definitely better than the original strncmp.  So, I would say that 
using opt_stncmp2 is always the best bet.  I'm not certain why are you 
seeing that original strncmp is 100x faster than opt_strncmpX :-/

Cheers,

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-09 Thread Francesc Altet
A Friday 08 February 2008, Charles R Harris escrigué:
 On Feb 8, 2008 10:31 AM, Francesc Altet [EMAIL PROTECTED] wrote:
  A Friday 08 February 2008, Francesc Altet escrigué:
   A Friday 08 February 2008, Charles R Harris escrigué:
 Also, in the context of my work in indexing, and because of
 the slowness of the current implementation in NumPy, I've
 ended with an implementation of the quicksort method for 1-D
 array strings. For moderately large arrays, it is about
 2.5x-3x faster than the (supposedly) mergesort version in
 NumPy, not only due to the quicksort, but also because I've
 implemented a couple of macros for efficient string swapping
 and copy.  If this is of interest for NumPy developers, tell
 me and I will provide the code.
   
I have some code for this too and was going to merge it. Send
yours along and I'll get to it this weekend.
  
   Ok, great.  I'm attaching it.  Tell me if you need some
   clarification on the code.
 
  Ops.  I've introduced a last-minute problem in my code.  To fix
  this, just replace the flawed opt_strncmp() that I sent before by:
 
  static int inline opt_strncmp(char *a, char *b, int n) {
   int i;
   for (i=0; in; i++) {
 if (a[i]  b[i]) return i+1;
 if (a[i]  b[i]) return -(i+1);
 /* Another way, but seems equivalent in speed, at least here */
  /* if (a[i] != b[i]) */
  /*   return (((unsigned char *)a)[i] - ((unsigned char
  *)b)[i]); */ }
   return 0;
  }
 
  Apparently, this version works just fine.

 Did you find this significantly faster than strncmp? There is also a
 unicode compare, do you have thoughts about that?

Well, for the unicode case it wouldn't be enough by replacing 'char' 
by 'Py_ArrayUCS4'?  Maybe this afternoon I can do some benchmarking too 
in this regard.

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-09 Thread Charles R Harris
On Feb 9, 2008 6:47 AM, Francesc Altet [EMAIL PROTECTED] wrote:

 A Friday 08 February 2008, Charles R Harris escrigué:
  On Feb 8, 2008 10:31 AM, Francesc Altet [EMAIL PROTECTED] wrote:
   A Friday 08 February 2008, Francesc Altet escrigué:
A Friday 08 February 2008, Charles R Harris escrigué:
  Also, in the context of my work in indexing, and because of
  the slowness of the current implementation in NumPy, I've
  ended with an implementation of the quicksort method for 1-D
  array strings. For moderately large arrays, it is about
  2.5x-3x faster than the (supposedly) mergesort version in
  NumPy, not only due to the quicksort, but also because I've
  implemented a couple of macros for efficient string swapping
  and copy.  If this is of interest for NumPy developers, tell
  me and I will provide the code.

 I have some code for this too and was going to merge it. Send
 yours along and I'll get to it this weekend.
   
Ok, great.  I'm attaching it.  Tell me if you need some
clarification on the code.
  
   Ops.  I've introduced a last-minute problem in my code.  To fix
   this, just replace the flawed opt_strncmp() that I sent before by:
  
   static int inline opt_strncmp(char *a, char *b, int n) {
int i;
for (i=0; in; i++) {
  if (a[i]  b[i]) return i+1;
  if (a[i]  b[i]) return -(i+1);
  /* Another way, but seems equivalent in speed, at least here */
   /* if (a[i] != b[i]) */
   /*   return (((unsigned char *)a)[i] - ((unsigned char
   *)b)[i]); */ }
return 0;
   }
  
   Apparently, this version works just fine.
 
  Did you find this significantly faster than strncmp? There is also a
  unicode compare, do you have thoughts about that?

 Well, for the unicode case it wouldn't be enough by replacing 'char'
 by 'Py_ArrayUCS4'?  Maybe this afternoon I can do some benchmarking too
 in this regard.


Looks like that for Numpy. The problem I was thinking about is that for wide
characters Windows C defaults to UTF16 while the Unixes default to UTF32.
The C99 standard didn't specify the exact length, but Numpy seems to use (or
assume) UTF32.

Anyway, after doing some work to fool the optimizer and subtracting loop
overhead, strncmp still comes out a bit faster for me, 11e-9 vs 16e-9
seconds to compare strings of length 10. I've attached the program. Note
that on my machine malloc appears to return zeroed memory, so the string
compares always go to the end.

Chuck
#include stdlib.h
#include stdio.h
#include string.h

static int inline
strncmp1(char *a, char *b, int n)
{
int i;
for (i = 0; i  n; i++) {
if (a[i] != b[i])
return (((unsigned char *)a)[i] - ((unsigned char *)b)[i]);
}
return 0;
}

#define COMPARE(a, b, n) strncmp1(a,b,n)
#define REP 100
#define LEN 10

int main(int argc, char** argv)
{
char (*a)[LEN] = malloc(REP*LEN);
char (*b)[LEN] = malloc(REP*LEN);
int sum = 0;
int i;

for(i = 0; i  REP; ++i)
sum += COMPARE(a[i], b[i], LEN);

printf(%d\n, sum);
free(a);
free(b);
return 1;
}
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-09 Thread Francesc Altet
A Saturday 09 February 2008, Charles R Harris escrigué:
  Well, for the unicode case it wouldn't be enough by replacing
  'char' by 'Py_ArrayUCS4'?  Maybe this afternoon I can do some
  benchmarking too in this regard.

 Looks like that for Numpy. The problem I was thinking about is that
 for wide characters Windows C defaults to UTF16 while the Unixes
 default to UTF32.

If it were so simple ;-)  The fact is that the Python crew is delivering 
the tarballs ready to compile with the UCS2 as default, and this 
applies to both UNIX and Windows.  However, some Linux distributions 
(most in particular, Debian and derivatives), has chosen to make UCS4 
the default in their Python packages.

This is not a (big) problem in itself, but when it comes to writing 
arrays on disk and hope for portability (not only with different 
platforms, but also with different UCS python interpreter in the same 
machine!), we realized that this was a real problem (see discussion in 
[1]).  So, NumPy had to make a decision in that regard, and Travis 
finally opted to only give support for the UCS4 charset in NumPy [2].  
Also, he opened the door to possible UCS2 implementations in NumPy in 
the future, but that would be a real pain, IMHO.

[1]http://projects.scipy.org/pipermail/numpy-discussion/2006-February/006081.html
[2]http://projects.scipy.org/pipermail/numpy-discussion/2006-February/006130.html

So, at least for the time being, you only have to worry about UCS4.

 The C99 standard didn't specify the exact length, 
 but Numpy seems to use (or assume) UTF32.

Well, I should say that UTF32 and UCS4 are names referring to the same 
thing, but most literature (and specially package configuration 
procedures) talks about UCS4.

 Anyway, after doing some work to fool the optimizer and subtracting
 loop overhead, strncmp still comes out a bit faster for me, 11e-9 vs
 16e-9 seconds to compare strings of length 10. I've attached the
 program. Note that on my machine malloc appears to return zeroed
 memory, so the string compares always go to the end.

I've seen the benchmark, and the problem is that C strncmp stops 
checking when it finds a \0 in the first string, while strncmp1 have to 
check the complete set of chars in strings.  However, you won't really 
want to do C string comparisons with NumPy strings:

In [35]: ns1 = numpy.array(as\0as)

In [36]: ns2 = numpy.array(as\0bs)

In [37]: ns1 == ns2
Out[37]: array(False, dtype=bool)

In [38]: ns1  ns2
Out[38]: array(True, dtype=bool)

or, with Python strings, in general:

In [39]: ns1 = as\0as

In [40]: ns2 = as\0bs

In [41]: ns1 == ns2
Out[41]: False

In [42]: ns1  ns2
Out[42]: True

As you see, Python/NumPy strings are different beasts than C strings in 
that regard.  The strings in the latter always end with a \0 (NULL) 
character, while in Python/NumPy the end is defined by a length 
property (btw, the same than in Pascal, if you know it).

So, strncmp1 is not only faster than its C counterpart, but also the one 
doing the correct job with NumPy (unicode) strings.

Cheers,

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-09 Thread Charles R Harris
On Feb 9, 2008 11:50 AM, Francesc Altet [EMAIL PROTECTED] wrote:

 A Saturday 09 February 2008, Charles R Harris escrigué:
   Well, for the unicode case it wouldn't be enough by replacing
   'char' by 'Py_ArrayUCS4'?  Maybe this afternoon I can do some
   benchmarking too in this regard.
 
  Looks like that for Numpy. The problem I was thinking about is that
  for wide characters Windows C defaults to UTF16 while the Unixes
  default to UTF32.

 If it were so simple ;-)  The fact is that the Python crew is delivering
 the tarballs ready to compile with the UCS2 as default, and this
 applies to both UNIX and Windows.  However, some Linux distributions
 (most in particular, Debian and derivatives), has chosen to make UCS4
 the default in their Python packages.

 This is not a (big) problem in itself, but when it comes to writing
 arrays on disk and hope for portability (not only with different
 platforms, but also with different UCS python interpreter in the same
 machine!), we realized that this was a real problem (see discussion in
 [1]).  So, NumPy had to make a decision in that regard, and Travis
 finally opted to only give support for the UCS4 charset in NumPy [2].
 Also, he opened the door to possible UCS2 implementations in NumPy in
 the future, but that would be a real pain, IMHO.


 [1]http://projects.scipy.org/pipermail/numpy-discussion/2006-February/006081.html

 [2]http://projects.scipy.org/pipermail/numpy-discussion/2006-February/006130.html

 So, at least for the time being, you only have to worry about UCS4.

  The C99 standard didn't specify the exact length,
  but Numpy seems to use (or assume) UTF32.

 Well, I should say that UTF32 and UCS4 are names referring to the same
 thing, but most literature (and specially package configuration
 procedures) talks about UCS4.

  Anyway, after doing some work to fool the optimizer and subtracting
  loop overhead, strncmp still comes out a bit faster for me, 11e-9 vs
  16e-9 seconds to compare strings of length 10. I've attached the
  program. Note that on my machine malloc appears to return zeroed
  memory, so the string compares always go to the end.

 I've seen the benchmark, and the problem is that C strncmp stops
 checking when it finds a \0 in the first string, while strncmp1 have to
 check the complete set of chars in strings.  However, you won't really
 want to do C string comparisons with NumPy strings:

 In [35]: ns1 = numpy.array(as\0as)

 In [36]: ns2 = numpy.array(as\0bs)

 In [37]: ns1 == ns2
 Out[37]: array(False, dtype=bool)

 In [38]: ns1  ns2
 Out[38]: array(True, dtype=bool)

 or, with Python strings, in general:

 In [39]: ns1 = as\0as

 In [40]: ns2 = as\0bs

 In [41]: ns1 == ns2
 Out[41]: False

 In [42]: ns1  ns2
 Out[42]: True

 As you see, Python/NumPy strings are different beasts than C strings in
 that regard.  The strings in the latter always end with a \0 (NULL)
 character, while in Python/NumPy the end is defined by a length
 property (btw, the same than in Pascal, if you know it).

 So, strncmp1 is not only faster than its C counterpart, but also the one
 doing the correct job with NumPy (unicode) strings.


Ah, in that case the current indirect sort for NumPy strings, which uses
strncmp, is incorrect and needs to be fixed. It seems that strings with
zeros are not part of the current test series ;)

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-09 Thread Charles R Harris
On Feb 9, 2008 2:07 PM, Francesc Altet [EMAIL PROTECTED] wrote:

 A Saturday 09 February 2008, Charles R Harris escrigué:
   So, strncmp1 is not only faster than its C counterpart, but also
   the one doing the correct job with NumPy (unicode) strings.
 
  Ah, in that case the current indirect sort for NumPy strings, which
  uses strncmp, is incorrect and needs to be fixed. It seems that
  strings with zeros are not part of the current test series ;)

 Yeah, that's right.  And yes, it would be advisable to have at least a
 couple of tests having zeros interspersed throughout the string.


Like this should do:

In [5]: argsort(fromstring(\0\2\0\1, dtype=|S2))
Out[5]: array([0, 1])

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-09 Thread Charles R Harris
On Feb 9, 2008 2:29 PM, Francesc Altet [EMAIL PROTECTED] wrote:

 Chuck,

 One more thing on this.  I've been doing some benchmarking with my
 opt_memcpy() macro in the quicksort_string function, and I should say
 that while it is definitely more efficient than my system memcpy for
 small values of n (the number of bytes to copy), this doesn't keep true
 for all values of n.  For example, for n16, opt_memcpy() can be more
 than 4x faster than system memcpy (and this is why I naively thought
 that it would be faster in general).  However, for n80, memcpy beats
 opt_memcpy between a 25% and 100% (depending on whether n is divisible
 by 2, 4 or 8).  This is on my Linux system (Ubuntu 7.10), but perhaps
 with Windows the behaviour can be different.

 I think I would be able to come up with a routine that can offer a
 balance between opt_memcpy and system memcpy, but that should take some
 time.  So, until I (or anybody else) do more research on this, I think
 it would be safer if you use system memcpy for string sorting in NumPy.


The memcpy in newer compilers is actually pretty good. For integers and such
it sometime compiles inline using integer assignments, but I was loath to
make it the default implementation until = 4.1.x gcc became more common.
However, strings might be a good place to use it.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-09 Thread Charles R Harris
On Feb 9, 2008 2:42 PM, Charles R Harris [EMAIL PROTECTED] wrote:



 On Feb 9, 2008 2:29 PM, Francesc Altet [EMAIL PROTECTED] wrote:

  Chuck,
 
  One more thing on this.  I've been doing some benchmarking with my
  opt_memcpy() macro in the quicksort_string function, and I should say
  that while it is definitely more efficient than my system memcpy for
  small values of n (the number of bytes to copy), this doesn't keep true
  for all values of n.  For example, for n16, opt_memcpy() can be more
  than 4x faster than system memcpy (and this is why I naively thought
  that it would be faster in general).  However, for n80, memcpy beats
  opt_memcpy between a 25% and 100% (depending on whether n is divisible
  by 2, 4 or 8).  This is on my Linux system (Ubuntu 7.10), but perhaps
  with Windows the behaviour can be different.
 
  I think I would be able to come up with a routine that can offer a
  balance between opt_memcpy and system memcpy, but that should take some
  time.  So, until I (or anybody else) do more research on this, I think
  it would be safer if you use system memcpy for string sorting in NumPy.
 

 The memcpy in newer compilers is actually pretty good. For integers and
 such it sometime compiles inline using integer assignments, but I was loath
 to make it the default implementation until = 4.1.x gcc became more
 common. However, strings might be a good place to use it.


I'm also thinking that at some point it becomes more efficient to do a
indirect sort followed by take than to move all those big strings around.
But I guess we won't know where that point is until we have both versions
available.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-09 Thread Francesc Altet
A Saturday 09 February 2008, Charles R Harris escrigué:
 On Feb 9, 2008 2:07 PM, Francesc Altet [EMAIL PROTECTED] wrote:
  A Saturday 09 February 2008, Charles R Harris escrigué:
So, strncmp1 is not only faster than its C counterpart, but
also the one doing the correct job with NumPy (unicode)
strings.
  
   Ah, in that case the current indirect sort for NumPy strings,
   which uses strncmp, is incorrect and needs to be fixed. It seems
   that strings with zeros are not part of the current test series
   ;)
 
  Yeah, that's right.  And yes, it would be advisable to have at
  least a couple of tests having zeros interspersed throughout the
  string.

 Like this should do:

 In [5]: argsort(fromstring(\0\2\0\1, dtype=|S2))
 Out[5]: array([0, 1])

Exactly, but I understand that the correct result should be:

array([1, 0])

;-)

Something a bit more complex, like:

In [5]: argsort(fromstring(a\0b\0\0\2a\0b\0\0\1, dtype=|S6))
Out[5]: array([1, 0])

wouldn't hurt neither.

Cheers,

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-09 Thread Francesc Altet
Chuck,

One more thing on this.  I've been doing some benchmarking with my 
opt_memcpy() macro in the quicksort_string function, and I should say 
that while it is definitely more efficient than my system memcpy for 
small values of n (the number of bytes to copy), this doesn't keep true 
for all values of n.  For example, for n16, opt_memcpy() can be more 
than 4x faster than system memcpy (and this is why I naively thought 
that it would be faster in general).  However, for n80, memcpy beats 
opt_memcpy between a 25% and 100% (depending on whether n is divisible 
by 2, 4 or 8).  This is on my Linux system (Ubuntu 7.10), but perhaps 
with Windows the behaviour can be different.

I think I would be able to come up with a routine that can offer a 
balance between opt_memcpy and system memcpy, but that should take some 
time.  So, until I (or anybody else) do more research on this, I think 
it would be safer if you use system memcpy for string sorting in NumPy.

Cheers,

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-09 Thread Francesc Altet
A Saturday 09 February 2008, Charles R Harris escrigué:
 I'm also thinking that at some point it becomes more efficient to do
 a indirect sort followed by take than to move all those big strings
 around. But I guess we won't know where that point is until we have
 both versions available.

I've done some experiments in that matter too.  They are saying that, 
with the current mergesort in NumPy, an indirect sort followed by take 
performs similarly to direct sort for small string lengths (=16), but 
indirect sort starts to win afterwards.

The version with quicksort and optimized sSWAP should be between 2x and 
3x times faster than current mergesort implementation, so the advantage 
for direct sort could grow up to somewhere between 50 and 100.  A nice 
idea could be doing some more toughful experiments in order to find the 
point where an indirect sort followed by a take would be more efficient 
and automatically select this method beyond that point.

However, this has the drawback that you have to use additional memory 
for keeping the indices in the indirect method.  Of course, when 
strings are large, those indices should take a rather negligible space 
compared with strings itself.  In any case, in some situations where 
space is critical, this can still be important.  I don't know, but my 
opinion is that we shouldn't take too agressive optimizations for that 
matter.  My vote is to document this possibility in the docstrings, so 
that the user wanting for extreme performance can use this approach if 
he wants to.  Still, for string sizes greater than, say, 1000, well, an 
automatic selection of the indirect method is very tempting indeed.

Cheers,

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-09 Thread Charles R Harris
On Feb 9, 2008 3:19 PM, Francesc Altet [EMAIL PROTECTED] wrote:

 A Saturday 09 February 2008, Charles R Harris escrigué:
  I'm also thinking that at some point it becomes more efficient to do
  a indirect sort followed by take than to move all those big strings
  around. But I guess we won't know where that point is until we have
  both versions available.

 I've done some experiments in that matter too.  They are saying that,
 with the current mergesort in NumPy, an indirect sort followed by take
 performs similarly to direct sort for small string lengths (=16), but
 indirect sort starts to win afterwards.

 The version with quicksort and optimized sSWAP should be between 2x and
 3x times faster than current mergesort implementation, so the advantage
 for direct sort could grow up to somewhere between 50 and 100.  A nice
 idea could be doing some more toughful experiments in order to find the
 point where an indirect sort followed by a take would be more efficient
 and automatically select this method beyond that point.

 However, this has the drawback that you have to use additional memory
 for keeping the indices in the indirect method.  Of course, when
 strings are large, those indices should take a rather negligible space
 compared with strings itself.  In any case, in some situations where
 space is critical, this can still be important.  I don't know, but my
 opinion is that we shouldn't take too agressive optimizations for that
 matter.  My vote is to document this possibility in the docstrings, so
 that the user wanting for extreme performance can use this approach if
 he wants to.  Still, for string sizes greater than, say, 1000, well, an
 automatic selection of the indirect method is very tempting indeed.


The strings with zeros problem runs deeper than it looked at first glance.
Normal sorts don't work either, which means the type has a bad comparison
function. And argsort still doesn't work even with the correct comparison
function. Python, however, works as it should sorting lists of strings with
zeros. So I'm going to have to track down and fix this oddity, but it is
going to delay putting in the type specific quicksort for strings.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-08 Thread Charles R Harris
On Feb 8, 2008 5:29 AM, Francesc Altet [EMAIL PROTECTED] wrote:

 Hi,

 I'm a bit confused that the sort method of a string character doesn't
 allow a mergesort:

  s = numpy.empty(10, S10)
  s.sort(kind=merge)
 TypeError: desired sort not supported for this type


I think it's an error parsing the keyword. In fact, I thought I fixed that,
but maybe I was waiting till I added the other methods.


 However, by looking at the numpy sources, it seems that the only
 implemented method for sorting array strings is merge (I presume
 because it is stable).  So, perhaps the message above should be fixed.

 Also, in the context of my work in indexing, and because of the slowness
 of the current implementation in NumPy, I've ended with an
 implementation of the quicksort method for 1-D array strings.  For
 moderately large arrays, it is about 2.5x-3x faster than the
 (supposedly) mergesort version in NumPy, not only due to the quicksort,
 but also because I've implemented a couple of macros for efficient
 string swapping and copy.  If this is of interest for NumPy developers,
 tell me and I will provide the code.


I have some code for this too and was going to merge it. Send yours along
and I'll get to it this weekend.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-08 Thread Francesc Altet
A Friday 08 February 2008, Charles R Harris escrigué:
  Also, in the context of my work in indexing, and because of the
  slowness of the current implementation in NumPy, I've ended with an
  implementation of the quicksort method for 1-D array strings.  For
  moderately large arrays, it is about 2.5x-3x faster than the
  (supposedly) mergesort version in NumPy, not only due to the
  quicksort, but also because I've implemented a couple of macros for
  efficient string swapping and copy.  If this is of interest for
  NumPy developers, tell me and I will provide the code.

 I have some code for this too and was going to merge it. Send yours
 along and I'll get to it this weekend.

Ok, great.  I'm attaching it.  Tell me if you need some clarification on 
the code.

Cheers,

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
/* This is a quick sort implementation for numpy strings, mainly for
   testing purposes, but could be useful in the future.  It happens to
   be between 2.5x and 3x (for long enough arrays) than the string
   sort in NumPy (based in merge sort).

   Francesc Altet 2008-02-07
 */

#define sSWAP(a,b) {		\
for (i=0; iss; i++) {	\
  ((char *)SWAP_temp)[i] = ((char *)(b))[i];		\
  ((char *)b)[i] = ((char *)(a))[i];			\
  ((char *)a)[i] = ((char *)(SWAP_temp))[i];		\
}\
  }


#define opt_memcpy(a,b,n) {\
for (i=0; i(n); i++) {\
  ((char *)a)[i] = ((char *)(b))[i];		\
}			\
  }


/* opt_strncmp is an optimized version of strncmp, mainly for easy the
   inlining by the compiler.  I'm not sure whether the inline would
   create problems with other compilers than GCC, but it can safely
   removed and let the compiler decide if it should do the inlining or
   not.  In any case, the inlining can improve the global performance
   by a 20% or more.

   This is only valid for regular strings, but adding support for UCS4
   should be a matter of replacing 'char' by 'PyArray_UCS4', I think.
 */
static inline int opt_strncmp(char *a, char *b, int n) {
  int i;
  for (i=0; in; i++) {
if (a[i]  b[i]) return i;
if (a[i]  b[i]) return -i;
  }
  return 0;
}


/* start: the address where the data for 1-D string array starts
   ss: the size of the string elements.
   num: the number of elements
  */
int quicksort_string(char *start, int ss, intp num)
{
  char *pl = start;
  char *pr = start + (num - 1) * ss;
  char *vp;
  char *SWAP_temp;
  char *stack[PYA_QS_STACK], **sptr = stack;
  char *pm, *pi, *pj, *pt;
  int i;

  vp = malloc(ss);
  SWAP_temp = malloc(ss);
  for(;;) {
while (((pr - pl)/ss)  SMALL_QUICKSORT) {
  /* quicksort partition */
  pm = pl + (((pr-pl)/ss)  1)*ss;
  if (opt_strncmp(pm, pl, ss)  0) { sSWAP(pm, pl); }
  if (opt_strncmp(pr, pm, ss)  0) { sSWAP(pr, pm); }
  if (opt_strncmp(pm, pl, ss)  0) { sSWAP(pm, pl); }
  opt_memcpy(vp, pm, ss);
  pi = pl;
  pj = pr - ss;
  sSWAP(pm, pj);
  for(;;) {
	do { pi += ss; } while (opt_strncmp(pi, vp, ss)  0);
	do { pj -= ss; } while (opt_strncmp(vp, pj, ss)  0);
	if (pi = pj)  break;
	sSWAP(pi, pj);
  }
  sSWAP(pi, pr-ss);
  /* push largest partition on stack */
  if (pi - pl  pr - pi) {
	*sptr++ = pi + ss;
	*sptr++ = pr;
	pr = pi - ss;
  }else{
	*sptr++ = pl;
	*sptr++ = pi - ss;
	pl = pi + ss;
  }
}
/* insertion sort */
for(pi = pl + ss; pi = pr;	pi += ss) {
  opt_memcpy(vp, pi, ss);
  for(pj = pi, pt = pi - ss; pj  pl  opt_strncmp(vp, pt, ss)  0;) {
	opt_memcpy(pj, pt, ss);
	pj -= ss; pt -= ss;
  }
  opt_memcpy(pj, vp, ss);
}
if (sptr == stack) break;
pr = *(--sptr);
pl = *(--sptr);
  }

  free(vp);
  free(SWAP_temp);

  return 0;
}


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-08 Thread Francesc Altet
A Friday 08 February 2008, Francesc Altet escrigué:
 A Friday 08 February 2008, Charles R Harris escrigué:
   Also, in the context of my work in indexing, and because of the
   slowness of the current implementation in NumPy, I've ended with
   an implementation of the quicksort method for 1-D array strings. 
   For moderately large arrays, it is about 2.5x-3x faster than the
   (supposedly) mergesort version in NumPy, not only due to the
   quicksort, but also because I've implemented a couple of macros
   for efficient string swapping and copy.  If this is of interest
   for NumPy developers, tell me and I will provide the code.
 
  I have some code for this too and was going to merge it. Send yours
  along and I'll get to it this weekend.

 Ok, great.  I'm attaching it.  Tell me if you need some clarification
 on the code.

Ops.  I've introduced a last-minute problem in my code.  To fix this, 
just replace the flawed opt_strncmp() that I sent before by:

static int inline opt_strncmp(char *a, char *b, int n) {
  int i;
  for (i=0; in; i++) {
if (a[i]  b[i]) return i+1;
if (a[i]  b[i]) return -(i+1);
/* Another way, but seems equivalent in speed, at least here */
/* if (a[i] != b[i]) */
/*   return (((unsigned char *)a)[i] - ((unsigned char *)b)[i]); */
  }
  return 0;
}

Apparently, this version works just fine.

Cheers,

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-08 Thread Charles R Harris
On Feb 8, 2008 8:58 AM, Francesc Altet [EMAIL PROTECTED] wrote:

 A Friday 08 February 2008, Charles R Harris escrigué:
   Also, in the context of my work in indexing, and because of the
   slowness of the current implementation in NumPy, I've ended with an
   implementation of the quicksort method for 1-D array strings.  For
   moderately large arrays, it is about 2.5x-3x faster than the
   (supposedly) mergesort version in NumPy, not only due to the
   quicksort, but also because I've implemented a couple of macros for
   efficient string swapping and copy.  If this is of interest for
   NumPy developers, tell me and I will provide the code.
 
  I have some code for this too and was going to merge it. Send yours
  along and I'll get to it this weekend.

 Ok, great.  I'm attaching it.  Tell me if you need some clarification on
 the code.


I ran a few timing tests. On my machine strncmp is about 100x faster than
opt_strncmp, but  sSWAP (with some fixes), is about 10x faster then using
the memcpy in a recent compiler. Does this match with your experience.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String sort

2008-02-08 Thread Charles R Harris
On Feb 8, 2008 10:31 AM, Francesc Altet [EMAIL PROTECTED] wrote:

 A Friday 08 February 2008, Francesc Altet escrigué:
  A Friday 08 February 2008, Charles R Harris escrigué:
Also, in the context of my work in indexing, and because of the
slowness of the current implementation in NumPy, I've ended with
an implementation of the quicksort method for 1-D array strings.
For moderately large arrays, it is about 2.5x-3x faster than the
(supposedly) mergesort version in NumPy, not only due to the
quicksort, but also because I've implemented a couple of macros
for efficient string swapping and copy.  If this is of interest
for NumPy developers, tell me and I will provide the code.
  
   I have some code for this too and was going to merge it. Send yours
   along and I'll get to it this weekend.
 
  Ok, great.  I'm attaching it.  Tell me if you need some clarification
  on the code.

 Ops.  I've introduced a last-minute problem in my code.  To fix this,
 just replace the flawed opt_strncmp() that I sent before by:

 static int inline opt_strncmp(char *a, char *b, int n) {
  int i;
  for (i=0; in; i++) {
if (a[i]  b[i]) return i+1;
if (a[i]  b[i]) return -(i+1);
/* Another way, but seems equivalent in speed, at least here */
 /* if (a[i] != b[i]) */
 /*   return (((unsigned char *)a)[i] - ((unsigned char *)b)[i]); */
  }
  return 0;
 }

 Apparently, this version works just fine.


Did you find this significantly faster than strncmp? There is also a unicode
compare, do you have thoughts about that?

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] String sort

2008-02-08 Thread Francesc Altet
Hi,

I'm a bit confused that the sort method of a string character doesn't 
allow a mergesort:

 s = numpy.empty(10, S10)
 s.sort(kind=merge)
TypeError: desired sort not supported for this type

However, by looking at the numpy sources, it seems that the only 
implemented method for sorting array strings is merge (I presume 
because it is stable).  So, perhaps the message above should be fixed.

Also, in the context of my work in indexing, and because of the slowness 
of the current implementation in NumPy, I've ended with an 
implementation of the quicksort method for 1-D array strings.  For 
moderately large arrays, it is about 2.5x-3x faster than the 
(supposedly) mergesort version in NumPy, not only due to the quicksort, 
but also because I've implemented a couple of macros for efficient 
string swapping and copy.  If this is of interest for NumPy developers, 
tell me and I will provide the code.

Cheers,

-- 
0,0   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 -
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion