Re: [Numpy-discussion] String sort
Hi Chuck, I've given more testing to the new quicksort routines for strings in the forthcoming NumPy. I've run the indexing test units in PyTables Pro (they stress the sorting routines a lot) against the current version of NumPy in the repository, for the complete set of quicksort, mergesort and heapsort of the new implementation, and I'm happy to say to everything went very smoothly, i.e. more than 1000 tests with different input arrays has passed flawlessly. Good job! Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Thursday 14 February 2008, Charles R Harris escrigué: On Thu, Feb 14, 2008 at 1:03 PM, Francesc Altet [EMAIL PROTECTED] wrote: Maybe I'd also be interested in trying insertion sort out. During the optimization process of an OPSI index, there is a need to sort out a slice of data that is already made of smaller chunks that are already sorted, so chances are that insertion sort could be significantly faster than the merge sort (or even the quick sort) in this scenario. But this is becoming an OT. However, I'd be glad to further dicuss this privately, if you like to. Well, I don't have much more to say. If you do decide that insertion sort will be useful you won't have to twist my arm much to get it, but I think it is most useful when the data never has to move far. In the case of quicksort and mergesort it is called to deal with small unsorted chunks, but the chunks themselves are already in the right place. Some kind of multi merge sort might be more appropriate to the OPSI index. OK, thanks, I'll have you in mind ;) And yes, multi-way merge seems very interesting for OPSI indeed. Eventually, it might even be a good candidate for taking advantage of the several cores in modern CPU's; so it would be really interesting to check out this path. Thanks for very insightful hints! -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Fri, Feb 15, 2008 at 5:09 AM, Francesc Altet [EMAIL PROTECTED] wrote: Hi Chuck, I've given more testing to the new quicksort routines for strings in the forthcoming NumPy. I've run the indexing test units in PyTables Pro (they stress the sorting routines a lot) against the current version of NumPy in the repository, for the complete set of quicksort, mergesort and heapsort of the new implementation, and I'm happy to say to everything went very smoothly, i.e. more than 1000 tests with different input arrays has passed flawlessly. Good job! Hi Francesc, Thanks for the thorough testing. It makes me feel much more comfortable this close to the release of 1.0.5. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Friday 15 February 2008, Charles R Harris escrigué: On Fri, Feb 15, 2008 at 5:09 AM, Francesc Altet [EMAIL PROTECTED] wrote: Hi Chuck, I've given more testing to the new quicksort routines for strings in the forthcoming NumPy. I've run the indexing test units in PyTables Pro (they stress the sorting routines a lot) against the current version of NumPy in the repository, for the complete set of quicksort, mergesort and heapsort of the new implementation, and I'm happy to say to everything went very smoothly, i.e. more than 1000 tests with different input arrays has passed flawlessly. Good job! Hi Francesc, Thanks for the thorough testing. It makes me feel much more comfortable this close to the release of 1.0.5. You are welcome. I forgot to mention that I tested out both direct (.sort()) and indirect (.argsort()) methods. Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Wednesday 13 February 2008, Scott Ransom escrigué: On Wednesday 13 February 2008 02:37:37 pm Francesc Altet wrote: So, I'd say that the guilty is the gcc 4.2.1, 64-bit (or at very least, AMD Opteron architecture) and that newqsort performs really well in general (provided that the compiler can find the best path for optimizing its code). Anyone using a 64-bit platform and having both gcc 4.1.2 and 4.2.1 installed can confirm this? Here are results from a 64-bit Debian system using a Core2 Duo 2.66 GHz processor. I used gcc 3.4.6, 4.1.3, 4.2.3, and 4.3.0 (20080202 experimental) with -O2 and -O3. Summary: There is a big difference between -02 and -O3. gcc-4.2 seems slightly better than the other gccs. And the newqsort is a lot faster (always) than the libc version. Scott eiger:/data1$ ./sort346_O2 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.55 C qsort with Python style compare: 0.53 NumPy newqsort: 0.45 eiger:/data1$ ./sort346_O3 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.55 C qsort with Python style compare: 0.52 NumPy newqsort: 0.35 eiger:/data1$ ./sort413_O2 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.56 C qsort with Python style compare: 0.53 NumPy newqsort: 0.42 eiger:/data1$ ./sort413_O3 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.54 C qsort with Python style compare: 0.50 NumPy newqsort: 0.28 eiger:/data1$ ./sort423_O2 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.56 C qsort with Python style compare: 0.53 NumPy newqsort: 0.39 eiger:/data1$ ./sort423_O3 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.53 C qsort with Python style compare: 0.50 NumPy newqsort: 0.27 eiger:/data1$ ./sort43_O2 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.55 C qsort with Python style compare: 0.53 NumPy newqsort: 0.34 eiger:/data1$ ./sort43_O3 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.53 C qsort with Python style compare: 0.51 NumPy newqsort: 0.33 Thanks Scott. Your input is very valuable, as it seems to confirm that the problem must be on gcc 4.2.1 on 64-bit (or Opteron architecture at very least) because apparently your gcc 4.2.3 is doing very well. It's a pity that I don't have a 4.2.3 available in our SuSe/Opteron machine so as to check if the optimization flaw disappears. But it seems to me that the problem could be specific of 4.2.1, and apparently the GCC crew has fixed the problem in 4.2.3, which is a relief. In any case, if anybody have access to an Opteron machine and gcc 4.2.3, it would be great if he can run the benchmark and contribute his feedback. Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Thursday 14 February 2008, escriguéreu: In any case, if anybody have access to an Opteron machine and gcc 4.2.3, it would be great if he can run the benchmark and contribute his feedback. Here it is with gcc 4.2.3 on an Opteron 246 (2.0 GHz): uller:~$ ./sort423_O2# with -O2 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.77 C qsort with Python style compare: 0.74 NumPy newqsort: 0.63 uller:~$ ./sort423_O3# with -O3 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.64 C qsort with Python style compare: 0.66 NumPy newqsort: 0.40 And here are my timings with gcc 4.1.3 and using a similar Opteron than yours (270 @ 2.0 GHz): With -O2: Benchmark with 100 strings of size 15 C qsort with C style compare: 0.75 C qsort with Python style compare: 0.70 NumPy newqsort: 0.69 With -O3: Benchmark with 100 strings of size 15 C qsort with C style compare: 0.67 C qsort with Python style compare: 0.62 NumPy newqsort: 0.38 So, it seems clear that the GCC people has fixed in 4.2.3 the problem with the optimizer introduced in 4.2.1. Very good! By the way, it's nice to see the wide range of platforms that this list allows to test out :-) Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Thu, Feb 14, 2008 at 9:11 AM, Francesc Altet [EMAIL PROTECTED] wrote: A Wednesday 13 February 2008, Charles R Harris escrigué: On Feb 13, 2008 10:56 AM, Francesc Altet [EMAIL PROTECTED] wrote: Be warned, I'd like to stress out that these are my figures for my _own laptop_. It would be nice if you can verify all of this with other achitectures (your Core2 machine seems different enough). I can run the benchmarks on Windows (installed in the same laptop) too. Tell me if you are interested on me doing this. Its easy enough to test if you compile from svn, just add your new copy function and change the name in this line: #copy=copy_string, copy_ucs4# to use your function instead of copy_string. I've spoken too fast. I've never compiled NumPy on Windows before, and had problems when trying to compile it using MSVC 7.1 and a recent copy of the repository. Well, in any case, I've exercised the Opteron platform, using gcc 4.1.3 (i.e. the one that can optimize newqsort correctly), and this brings new light to our study. From the plot (attached), it can be drawn the next conclusions: 1) copy_string2 (the combination of manual copy and memcpy) is not better than memcpy for *any* value of the string length in our Opteron platform. Also, the improvements with Pentium4 was not that big (20%). In consequence, I'd choose to always use memcpy and discard copy_string2. Your copy_string2 is now the version in numpy. I'm hesitant to make memcpy the default for all string lengths because I've read that memcpy was much improved in later gcc (= 4.1 ?), but known slow in older versions. So perhaps in a year or two when the newer compilers are more widespread would be a better time to make the change. The switch at the 16 char length shouldn't make that much difference in practice. I'll put a comment in the source so that the thought won't get lost. BTW, using copy_string2 much improved the performance of the string mergesort where a lot of data needs to be copied to the work array. It's now half as fast as quicksort instead of 1/3 ;) Heap sort continues in it's traditional slot as the slowest of all. Slow but safe. 2) Curiously enough, the indirect sort in Opterons is *always* faster than newqsort+memcpy. For large values of string lengths ( 256), the speed-up can be up to 3x, which is a lot. And I'd say that this keeps true also for most modern processors (read Core2, Barcelona). For older processors (Pentium4), the indirect method can be slower than direct plot for small lengths, but by a very few extent (10%). The new indirect quicksort for strings is faster than the old qsort based default, so perhaps that is also making a difference. Conclusion 2 makes me wonder if it wouldn't be useful the introduction of a new flag in sort, like: `indirect` - Use the indirect method for sorting. This requires more memory (4/8 bytes per array element), but for sorting arrays of strings it is almost always faster than the direct approach (default). Beware: this is not the case when using numerical values, where the use of this method for sorting is not recommendable. I'm more inclined to leave this to the user. I have a todo to add a function to numpy that makes it easier to use the argsort output to sort multidimensional arrays, I'll name it argtake or some such and it will use the argsort output along with an axis argument. It won't be quite as memory efficient for multidimensional arrays, but it should work about the same in the 1D case. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Thursday 14 February 2008, Charles R Harris escrigué: On Thu, Feb 14, 2008 at 9:11 AM, Francesc Altet [EMAIL PROTECTED] wrote: From the plot (attached), it can be drawn the next conclusions: 1) copy_string2 (the combination of manual copy and memcpy) is not better than memcpy for *any* value of the string length in our Opteron platform. Also, the improvements with Pentium4 was not that big (20%). In consequence, I'd choose to always use memcpy and discard copy_string2. Your copy_string2 is now the version in numpy. I'm hesitant to make memcpy the default for all string lengths because I've read that memcpy was much improved in later gcc (= 4.1 ?), but known slow in older versions. So perhaps in a year or two when the newer compilers are more widespread would be a better time to make the change. The switch at the 16 char length shouldn't make that much difference in practice. I'll put a comment in the source so that the thought won't get lost. Well, copy_string2 is only marginally slower than memcpy on modern gcc compilers and processors, so I presume that this is fine. BTW, using copy_string2 much improved the performance of the string mergesort where a lot of data needs to be copied to the work array. It's now half as fast as quicksort instead of 1/3 ;) Heap sort continues in it's traditional slot as the slowest of all. Slow but safe. I've seen that you have added specific code for merge and heap sorting algorithms for strings. Looks good. 2) Curiously enough, the indirect sort in Opterons is *always* faster than newqsort+memcpy. For large values of string lengths ( 256), the speed-up can be up to 3x, which is a lot. And I'd say that this keeps true also for most modern processors (read Core2, Barcelona). For older processors (Pentium4), the indirect method can be slower than direct plot for small lengths, but by a very few extent (10%). The new indirect quicksort for strings is faster than the old qsort based default, so perhaps that is also making a difference. Yes, indeed it does! Conclusion 2 makes me wonder if it wouldn't be useful the introduction of a new flag in sort, like: `indirect` - Use the indirect method for sorting. This requires more memory (4/8 bytes per array element), but for sorting arrays of strings it is almost always faster than the direct approach (default). Beware: this is not the case when using numerical values, where the use of this method for sorting is not recommendable. I'm more inclined to leave this to the user. I have a todo to add a function to numpy that makes it easier to use the argsort output to sort multidimensional arrays, I'll name it argtake or some such and it will use the argsort output along with an axis argument. It won't be quite as memory efficient for multidimensional arrays, but it should work about the same in the 1D case. OK. I don't completely grasp what are you trying to do here, but seems like a conservative enough path (in the sense that it won't touch the current parameters of existing sorting methods). Looking forward to see the new qsort for strings in NumPy (the specific version for merge sort is very welcome too!). Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Thu, Feb 14, 2008 at 10:46 AM, Francesc Altet [EMAIL PROTECTED] wrote: A Thursday 14 February 2008, Charles R Harris escrigué: On Thu, Feb 14, 2008 at 9:11 AM, Francesc Altet [EMAIL PROTECTED] wrote: From the plot (attached), it can be drawn the next conclusions: 1) copy_string2 (the combination of manual copy and memcpy) is not better than memcpy for *any* value of the string length in our Opteron platform. Also, the improvements with Pentium4 was not that big (20%). In consequence, I'd choose to always use memcpy and discard copy_string2. Your copy_string2 is now the version in numpy. I'm hesitant to make memcpy the default for all string lengths because I've read that memcpy was much improved in later gcc (= 4.1 ?), but known slow in older versions. So perhaps in a year or two when the newer compilers are more widespread would be a better time to make the change. The switch at the 16 char length shouldn't make that much difference in practice. I'll put a comment in the source so that the thought won't get lost. Well, copy_string2 is only marginally slower than memcpy on modern gcc compilers and processors, so I presume that this is fine. BTW, using copy_string2 much improved the performance of the string mergesort where a lot of data needs to be copied to the work array. It's now half as fast as quicksort instead of 1/3 ;) Heap sort continues in it's traditional slot as the slowest of all. Slow but safe. I've seen that you have added specific code for merge and heap sorting algorithms for strings. Looks good. 2) Curiously enough, the indirect sort in Opterons is *always* faster than newqsort+memcpy. For large values of string lengths ( 256), the speed-up can be up to 3x, which is a lot. And I'd say that this keeps true also for most modern processors (read Core2, Barcelona). For older processors (Pentium4), the indirect method can be slower than direct plot for small lengths, but by a very few extent (10%). The new indirect quicksort for strings is faster than the old qsort based default, so perhaps that is also making a difference. Yes, indeed it does! Conclusion 2 makes me wonder if it wouldn't be useful the introduction of a new flag in sort, like: `indirect` - Use the indirect method for sorting. This requires more memory (4/8 bytes per array element), but for sorting arrays of strings it is almost always faster than the direct approach (default). Beware: this is not the case when using numerical values, where the use of this method for sorting is not recommendable. I'm more inclined to leave this to the user. I have a todo to add a function to numpy that makes it easier to use the argsort output to sort multidimensional arrays, I'll name it argtake or some such and it will use the argsort output along with an axis argument. It won't be quite as memory efficient for multidimensional arrays, but it should work about the same in the 1D case. OK. I don't completely grasp what are you trying to do here, but seems like a conservative enough path (in the sense that it won't touch the current parameters of existing sorting methods). Looking forward to see the new qsort for strings in NumPy (the specific version for merge sort is very welcome too!). I could never figure out what the merge sort was good for. I did the indirect version in numarray because I needed a stable sort to implement lexsort, which was my original aim. I just added the direct version for completeness. If you have a use for it, I would love to hear it. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Thursday 14 February 2008, Bruce Southey escrigué: Hi, I confirmed the gcc 4.2.3 performance for the Opteron: Benchmark with 100 strings of size 15 C qsort with C style compare: 0.63 C qsort with Python style compare: 0.63 NumPy newqsort: 0.36 I also installed the Intel icc 10.1 compiler on my Opteron system but I did not use any flags: $ /opt/intel/cc/10.1.008/bin/icc sort-string-bench.c -o icc_sort $ ./icc_sort Benchmark with 100 strings of size 15 C qsort with C style compare: 1.03 C qsort with Python style compare: 0.96 NumPy newqsort: 0.53 That's excellent Bruce. Definitely it looks like the problem with the optimizer in 4.2.1 has been fixed in 4.2.3. And why you haven't used optimization flags with ICC? just curious... Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Thursday 14 February 2008, Charles R Harris escrigué: On Thu, Feb 14, 2008 at 10:46 AM, Francesc Altet [EMAIL PROTECTED] wrote: Looking forward to see the new qsort for strings in NumPy (the specific version for merge sort is very welcome too!). I could never figure out what the merge sort was good for. I did the indirect version in numarray because I needed a stable sort to implement lexsort, which was my original aim. I just added the direct version for completeness. If you have a use for it, I would love to hear it. Well, I must to confess that I've not used merge sorts yet, but I'd like to test them in the context of my PSI (Partially Sorted Indexes, see [1] for a white paper on a concrete implementation) work. My hope is that, as a merge sort keeps the order of indices of elements that are equal (this is what 'stable' means), this would allow better compression rates for indices (and hence, less I/O effort to bring the indices from disk into memory and ultimately allowing for faster lookup speed). This will probably be only important when one have data distributions with rather low cardinality, but these scenarios can be more frequent/important than one may think. [1] http://www.carabos.com/docs/OPSI-indexes.pdf -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Thu, Feb 14, 2008 at 11:44 AM, Francesc Altet [EMAIL PROTECTED] wrote: A Thursday 14 February 2008, Charles R Harris escrigué: On Thu, Feb 14, 2008 at 10:46 AM, Francesc Altet [EMAIL PROTECTED] wrote: Looking forward to see the new qsort for strings in NumPy (the specific version for merge sort is very welcome too!). I could never figure out what the merge sort was good for. I did the indirect version in numarray because I needed a stable sort to implement lexsort, which was my original aim. I just added the direct version for completeness. If you have a use for it, I would love to hear it. Well, I must to confess that I've not used merge sorts yet, but I'd like to test them in the context of my PSI (Partially Sorted Indexes, see [1] for a white paper on a concrete implementation) work. My hope is that, as a merge sort keeps the order of indices of elements that are equal (this is what 'stable' means), this would allow better compression rates for indices (and hence, less I/O effort to bring the indices from disk into memory and ultimately allowing for faster lookup speed). This will probably be only important when one have data distributions with rather low cardinality, but these scenarios can be more frequent/important than one may think. Well, I take that back a bit. I think mergesort might work best for large memory mapped arrays because it does sequential accesses, which might be more disk efficient than random accesses. Then again, a divide and conquer approach like quicksort eventually becomes localized too. I've never experimented with really large sorts, they might perform differently than the sorts that fit in memory. Insertion sort is supposed to work well for almost sorted sequences, but that application has always seemed a bit specialized to me. Although I'll admit to being occasionally tempted to pull the insertion sorts out of quicksort and mergesort and make them their own (type specific) routines. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Thursday 14 February 2008, Charles R Harris escrigué: On Thu, Feb 14, 2008 at 11:44 AM, Francesc Altet [EMAIL PROTECTED] wrote: A Thursday 14 February 2008, Charles R Harris escrigué: On Thu, Feb 14, 2008 at 10:46 AM, Francesc Altet [EMAIL PROTECTED] wrote: Looking forward to see the new qsort for strings in NumPy (the specific version for merge sort is very welcome too!). I could never figure out what the merge sort was good for. I did the indirect version in numarray because I needed a stable sort to implement lexsort, which was my original aim. I just added the direct version for completeness. If you have a use for it, I would love to hear it. Well, I must to confess that I've not used merge sorts yet, but I'd like to test them in the context of my PSI (Partially Sorted Indexes, see [1] for a white paper on a concrete implementation) work. My hope is that, as a merge sort keeps the order of indices of elements that are equal (this is what 'stable' means), this would allow better compression rates for indices (and hence, less I/O effort to bring the indices from disk into memory and ultimately allowing for faster lookup speed). This will probably be only important when one have data distributions with rather low cardinality, but these scenarios can be more frequent/important than one may think. Well, I take that back a bit. I think mergesort might work best for large memory mapped arrays because it does sequential accesses, which might be more disk efficient than random accesses. Then again, a divide and conquer approach like quicksort eventually becomes localized too. I've never experimented with really large sorts, they might perform differently than the sorts that fit in memory. Yeah, but I don't really want to use merge sort for out-of-core sorting, but just because it is stable. The main point of a PSI indexing schema is that you don't need to completely sort your dataset (hence the name: Partially Sorted) in order to get an usable index, and this normally leads to much faster index creation times. Insertion sort is supposed to work well for almost sorted sequences, but that application has always seemed a bit specialized to me. Although I'll admit to being occasionally tempted to pull the insertion sorts out of quicksort and mergesort and make them their own (type specific) routines. Maybe I'd also be interested in trying insertion sort out. During the optimization process of an OPSI index, there is a need to sort out a slice of data that is already made of smaller chunks that are already sorted, so chances are that insertion sort could be significantly faster than the merge sort (or even the quick sort) in this scenario. But this is becoming an OT. However, I'd be glad to further dicuss this privately, if you like to. Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Thu, Feb 14, 2008 at 1:03 PM, Francesc Altet [EMAIL PROTECTED] wrote: A Thursday 14 February 2008, Charles R Harris escrigué: On Thu, Feb 14, 2008 at 11:44 AM, Francesc Altet [EMAIL PROTECTED] wrote: A Thursday 14 February 2008, Charles R Harris escrigué: On Thu, Feb 14, 2008 at 10:46 AM, Francesc Altet [EMAIL PROTECTED] wrote: Looking forward to see the new qsort for strings in NumPy (the specific version for merge sort is very welcome too!). I could never figure out what the merge sort was good for. I did the indirect version in numarray because I needed a stable sort to implement lexsort, which was my original aim. I just added the direct version for completeness. If you have a use for it, I would love to hear it. Well, I must to confess that I've not used merge sorts yet, but I'd like to test them in the context of my PSI (Partially Sorted Indexes, see [1] for a white paper on a concrete implementation) work. My hope is that, as a merge sort keeps the order of indices of elements that are equal (this is what 'stable' means), this would allow better compression rates for indices (and hence, less I/O effort to bring the indices from disk into memory and ultimately allowing for faster lookup speed). This will probably be only important when one have data distributions with rather low cardinality, but these scenarios can be more frequent/important than one may think. Well, I take that back a bit. I think mergesort might work best for large memory mapped arrays because it does sequential accesses, which might be more disk efficient than random accesses. Then again, a divide and conquer approach like quicksort eventually becomes localized too. I've never experimented with really large sorts, they might perform differently than the sorts that fit in memory. Yeah, but I don't really want to use merge sort for out-of-core sorting, but just because it is stable. The main point of a PSI indexing schema is that you don't need to completely sort your dataset (hence the name: Partially Sorted) in order to get an usable index, and this normally leads to much faster index creation times. Insertion sort is supposed to work well for almost sorted sequences, but that application has always seemed a bit specialized to me. Although I'll admit to being occasionally tempted to pull the insertion sorts out of quicksort and mergesort and make them their own (type specific) routines. Maybe I'd also be interested in trying insertion sort out. During the optimization process of an OPSI index, there is a need to sort out a slice of data that is already made of smaller chunks that are already sorted, so chances are that insertion sort could be significantly faster than the merge sort (or even the quick sort) in this scenario. But this is becoming an OT. However, I'd be glad to further dicuss this privately, if you like to. Well, I don't have much more to say. If you do decide that insertion sort will be useful you won't have to twist my arm much to get it, but I think it is most useful when the data never has to move far. In the case of quicksort and mergesort it is called to deal with small unsorted chunks, but the chunks themselves are already in the right place. Some kind of multi merge sort might be more appropriate to the OPSI index. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Tuesday 12 February 2008, Charles R Harris escrigué: On Feb 12, 2008 9:07 AM, Francesc Altet [EMAIL PROTECTED] wrote: * The newqsort performs the best on all the platforms we have checked (ranging from a 5% of improvement on Opteron/SuSe, up to 3.8x with some Pentium4/Ubuntu systems). The 3.8 is amazing, isn't it? I've found that the performance also depends on whether I initialize the strings with random or empty. With the random initialization the new sort is ~2x faster. That's fedora 8, core duo, 32 bit OS, gcc 4.1.2. Well, for me, a 3.8x (or even a 2x for that matter) of improvement is less amazing once you know that there is a flaw in C qsort for most of Linux distros around. Neither Windows, MacOSX or certain versions of Linux (namely, SuSe 10.3) does reflect such a large difference in performance. I'd say that newqsort that is to be included in next release of NumPy would be just a 10% better than using a sane implementation of C qsort. And, while a 10% is not as amazing than a 380%, the great news is that newqsort will provide first-class performance even to people having the flawed C qsort on their machines (which apparently are much more that I initially realized). Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Tuesday 12 February 2008, Bruce Southey escrigué: Hi, I have a Opteron 248 (2.66GHz) that with gcc 4.1.0 (SUSE10.1?) that gives C qsort with C style compare: 0.65 C qsort with Python style compare: 0.64 NumPy newqsort: 0.36 That's very intersting. In a similar configuration, but using SuSe 10.3 (Enterprise) instead of 10.1, I don't see this factor of almost 2 of difference in performance (in fact, both performances, C qsort and NumPy newqsort, are very similar). This seems to confirm that the GNU glibc crew has fixed the qsort performance very recently (i.e. I hope it is not only a fix in SuSe 10.3 Enterprise), and this is why most of current distros are seeing the poor performance in qsort. Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
Hi, I added gcc 4.2 from the openSUSE 10.1 repository so I now have both the 4.1.2 and 4.2.1 compilers installed. But still have glibc-2.4-31.1 installed. I see your result with 4.2.1 but not with 4.1.2 so I think that there could be a difference in the compiler flags. I don't know enough about those to help but I can test any suggestions. $ gcc --version gcc (GCC) 4.1.2 20070115 (prerelease) (SUSE Linux) $ gcc -O3 sort-string-bench.c -o sort412 $ ./sort412 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.63 C qsort with Python style compare: 0.64 NumPy newqsort: 0.36 $ gcc-4.2 --version gcc-4.2 (GCC) 4.2.1 (SUSE Linux) $ gcc-4.2 -O3 sort-string-bench.c -o sort421 $ ./sort421 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.62 C qsort with Python style compare: 0.61 NumPy newqsort: 0.55 This is the same as: $ gcc-4.2 -O2 -finline-functions sort-string-bench.c -o sort421 $ ./sort421 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.71 C qsort with Python style compare: 0.70 NumPy newqsort: 0.55 (NumPy newqsort with -O2 alone is 0.6) For completeness, 4.1.2 using '-O2' versus '-O2 -finline-functions' is NumPy newqsort: 0.62 vs NumPy newqsort: 0.50 Regards Bruce On Feb 13, 2008 4:35 AM, Francesc Altet [EMAIL PROTECTED] wrote: A Tuesday 12 February 2008, Bruce Southey escrigué: Hi, I have a Opteron 248 (2.66GHz) that with gcc 4.1.0 (SUSE10.1?) that gives C qsort with C style compare: 0.65 C qsort with Python style compare: 0.64 NumPy newqsort: 0.36 That's very intersting. In a similar configuration, but using SuSe 10.3 (Enterprise) instead of 10.1, I don't see this factor of almost 2 of difference in performance (in fact, both performances, C qsort and NumPy newqsort, are very similar). This seems to confirm that the GNU glibc crew has fixed the qsort performance very recently (i.e. I hope it is not only a fix in SuSe 10.3 Enterprise), and this is why most of current distros are seeing the poor performance in qsort. Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Feb 13, 2008 9:19 AM, Bruce Southey [EMAIL PROTECTED] wrote: Hi, I added gcc 4.2 from the openSUSE 10.1 repository so I now have both the 4.1.2 and 4.2.1 compilers installed. But still have glibc-2.4-31.1 installed. I see your result with 4.2.1 but not with 4.1.2 so I think that there could be a difference in the compiler flags. I don't know enough about those to help but I can test any suggestions. $ gcc --version gcc (GCC) 4.1.2 20070115 (prerelease) (SUSE Linux) $ gcc -O3 sort-string-bench.c -o sort412 $ ./sort412 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.63 C qsort with Python style compare: 0.64 NumPy newqsort: 0.36 $ gcc-4.2 --version gcc-4.2 (GCC) 4.2.1 (SUSE Linux) $ gcc-4.2 -O3 sort-string-bench.c -o sort421 $ ./sort421 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.62 C qsort with Python style compare: 0.61 NumPy newqsort: 0.55 This is the same as: $ gcc-4.2 -O2 -finline-functions sort-string-bench.c -o sort421 $ ./sort421 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.71 C qsort with Python style compare: 0.70 NumPy newqsort: 0.55 (NumPy newqsort with -O2 alone is 0.6) For completeness, 4.1.2 using '-O2' versus '-O2 -finline-functions' is NumPy newqsort: 0.62 vs NumPy newqsort: 0.50 It's certainly interesting how much difference the compiler/flags make. I suppose one more thing to add to the mix is the -march and -mtune flags. They didn't make much difference here, but they might for someone else. It would also be interesting to see how a 64 bit system handles things. /lib/libgcc_s-4.1.2 gcc 4.1.2, -O3 -march=prescott -mtune=generic Benchmark with 100 strings of size 15 C qsort with C style compare: 0.94 C qsort with Python style compare: 0.94 NumPy newqsort: 0.31 I suppose much also depends on the flags with which libgcc is compiled, which in turn probably depends on the distro. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Feb 13, 2008 10:56 AM, Francesc Altet [EMAIL PROTECTED] wrote: A Wednesday 13 February 2008, Charles R Harris escrigué: OK, The new quicksorts are in svn. Francesc, can you check them out? Looks good here. However, you seem to keep using your own copy_string() instead of plain memcpy(). In previous benchmarks, I've seen that copy_string() is faster than memcpy only for small values of the length of the block to be copied. Yes, I noticed that your benchmark program crossed over to using memcpy at 16 chars, and I will probably add that feature. I was being conservative to start with. snip Finally, you also will have noticed the indirect sort line in the plot. This is because I was curious about when this method would win a direct sort. And, by looking at the plot, it seems that the crosspoint is around strings of 128 bytes (much more in fact that I initially thought), and starts to be very significant (around 40% faster) at 256 bytes. So perhaps it would make sense to add the possibility to choose the indirect method when sorting those large strings. This, of course, would require more memory for the indices, but using 4 or 8 additional bytes (depending if we on 32-bit or 64-bit), when each string takes 200 bytes, doesn't seem too crazy. In any case, it would be nice to document this in docstrings. It would be easy to add this feature, but for the moment I think the best thing is to document it. Another fairly easy change that could be made is to support strided arrays. That might speed sorting of non-contiguous arrays and sorts on axis other than -1. The only reason it isn't there now is that I originally wrote the sorting routines for numarray and numarray's upper level interface passed contiguous arrays to the sort functions. Be warned, I'd like to stress out that these are my figures for my _own laptop_. It would be nice if you can verify all of this with other achitectures (your Core2 machine seems different enough). I can run the benchmarks on Windows (installed in the same laptop) too. Tell me if you are interested on me doing this. Its easy enough to test if you compile from svn, just add your new copy function and change the name in this line: #copy=copy_string, copy_ucs4# to use your function instead of copy_string. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Wednesday 13 February 2008, Charles R Harris escrigué: OK, The new quicksorts are in svn. Francesc, can you check them out? Looks good here. However, you seem to keep using your own copy_string() instead of plain memcpy(). In previous benchmarks, I've seen that copy_string() is faster than memcpy only for small values of the length of the block to be copied. In order to do a better assessment of this affirmation, I've created a small benchmark (attached) in order to compare your copy_string against system memcpy when sorting array strings. I've also come up with a new function (copy_string2, attached), that tries to combine the best of copy_string and memcpy. Look at the attached plot in order to see how they behave. In the plot, you can see how memcpy is generally faster than copy_string, specially for relatively large string lengths. However, copy_string can be faster for small lengths (the maximum difference, a 20%, happens around 8/10 bytes). It may happen that you were doing time mesaurements whith strings of size 8, and you were driven to the wrong conclusion that copy_string was faster than memcpy. In case you think that performance for small string lengths is important, you may want to include copy_string2, that uses one method or another depending on the size block to be copied. There is of course a small impact in performance (one more condition test was introduced), but it is rather negligible. Finally, you also will have noticed the indirect sort line in the plot. This is because I was curious about when this method would win a direct sort. And, by looking at the plot, it seems that the crosspoint is around strings of 128 bytes (much more in fact that I initially thought), and starts to be very significant (around 40% faster) at 256 bytes. So perhaps it would make sense to add the possibility to choose the indirect method when sorting those large strings. This, of course, would require more memory for the indices, but using 4 or 8 additional bytes (depending if we on 32-bit or 64-bit), when each string takes 200 bytes, doesn't seem too crazy. In any case, it would be nice to document this in docstrings. Be warned, I'd like to stress out that these are my figures for my _own laptop_. It would be nice if you can verify all of this with other achitectures (your Core2 machine seems different enough). I can run the benchmarks on Windows (installed in the same laptop) too. Tell me if you are interested on me doing this. Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - sort-string-bench.py Description: application/python static void copy_string2(char *s1, char *s2, size_t len) { if (len 16) { while(len--) { *s1++ = *s2++; } } else { memcpy(s1, s2, len); } } ubuntu-pentium4-newqsort.pdf Description: Adobe PDF document ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Feb 13, 2008 12:37 PM, Francesc Altet [EMAIL PROTECTED] wrote: A Wednesday 13 February 2008, Francesc Altet escrigué: A Wednesday 13 February 2008, Bruce Southey escrigué: Hi, I added gcc 4.2 from the openSUSE 10.1 repository so I now have both the 4.1.2 and 4.2.1 compilers installed. But still have glibc-2.4-31.1 installed. I see your result with 4.2.1 but not with 4.1.2 so I think that there could be a difference in the compiler flags. I don't know enough about those to help but I can test any suggestions. $ gcc --version gcc (GCC) 4.1.2 20070115 (prerelease) (SUSE Linux) $ gcc -O3 sort-string-bench.c -o sort412 $ ./sort412 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.63 C qsort with Python style compare: 0.64 NumPy newqsort: 0.36 $ gcc-4.2 --version gcc-4.2 (GCC) 4.2.1 (SUSE Linux) $ gcc-4.2 -O3 sort-string-bench.c -o sort421 $ ./sort421 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.62 C qsort with Python style compare: 0.61 NumPy newqsort: 0.55 This is the same as: $ gcc-4.2 -O2 -finline-functions sort-string-bench.c -o sort421 $ ./sort421 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.71 C qsort with Python style compare: 0.70 NumPy newqsort: 0.55 (NumPy newqsort with -O2 alone is 0.6) For completeness, 4.1.2 using '-O2' versus '-O2 -finline-functions' is NumPy newqsort: 0.62 vs NumPy newqsort: 0.50 That's really interesting. Let me remember my figures for our Opteron: 3) SuSe LE 10.3 (gcc 4.2.1, -O3, AMD Opteron @ 2 GHz) C qsort with C style compare: 0.64 C qsort with Python style compare: 0.60 NumPy newqsort: 0.59 Just an addedum. I've compiled the benchmark using gcc 4.1.2 using our Opteron machine. Here are the results: SuSe LE 10.3 (gcc 4.1.2, -O3, AMD Opteron @ 2 GHz) Benchmark with 100 strings of size 15 C qsort with C style compare: 0.62 C qsort with Python style compare: 0.61 NumPy newqsort: 0.38 So, I'm getting a 55% more of performance than by using gcc 4.2.1 (!). Also, I've installed gcc 4.2.1 on my laptop and here are the results: Ubuntu 7.10 (gcc 4.2.1, -O3, Intel Pentium 4 @ 2 GHz) Benchmark with 100 strings of size 15 C qsort with C style compare: 2.45 C qsort with Python style compare: 2.42 NumPy newqsort: 0.63 While using gcc 4.1.2, I get: Ubuntu 7.10 (gcc 4.1.3, -O3, Intel Pentium 4 @ 2 GHz) Benchmark with 100 strings of size 15 C qsort with C style compare: 2.45 C qsort with Python style compare: 2.44 NumPy newqsort: 0.65 So, in this case (32-bit platform) gcc 4.2.1 seems to perform similarly to 4.1.2. So, I'd say that the guilty is the gcc 4.2.1, 64-bit (or at very least, AMD Opteron architecture) and that newqsort performs really well in general (provided that the compiler can find the best path for optimizing its code). Anyone using a 64-bit platform and having both gcc 4.1.2 and 4.2.1 installed can confirm this? Also, MSVC 7.1 32-bit (with opt level /Ox) doesn't seem to find such a best path (the benchmark for newqsort takes 0.92s using MSVC 7.1, while gcc 4.1.2 takes 0.65s using the same machine, a 40% faster). I don't know whether newer versions of MSVC will do better or not, though. Now we need someone to try ICC ;) Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Wednesday 13 February 2008 02:37:37 pm Francesc Altet wrote: So, I'd say that the guilty is the gcc 4.2.1, 64-bit (or at very least, AMD Opteron architecture) and that newqsort performs really well in general (provided that the compiler can find the best path for optimizing its code). Anyone using a 64-bit platform and having both gcc 4.1.2 and 4.2.1 installed can confirm this? Here are results from a 64-bit Debian system using a Core2 Duo 2.66 GHz processor. I used gcc 3.4.6, 4.1.3, 4.2.3, and 4.3.0 (20080202 experimental) with -O2 and -O3. Summary: There is a big difference between -02 and -O3. gcc-4.2 seems slightly better than the other gccs. And the newqsort is a lot faster (always) than the libc version. Scott eiger:/data1$ ./sort346_O2 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.55 C qsort with Python style compare: 0.53 NumPy newqsort: 0.45 eiger:/data1$ ./sort346_O3 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.55 C qsort with Python style compare: 0.52 NumPy newqsort: 0.35 eiger:/data1$ ./sort413_O2 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.56 C qsort with Python style compare: 0.53 NumPy newqsort: 0.42 eiger:/data1$ ./sort413_O3 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.54 C qsort with Python style compare: 0.50 NumPy newqsort: 0.28 eiger:/data1$ ./sort423_O2 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.56 C qsort with Python style compare: 0.53 NumPy newqsort: 0.39 eiger:/data1$ ./sort423_O3 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.53 C qsort with Python style compare: 0.50 NumPy newqsort: 0.27 eiger:/data1$ ./sort43_O2 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.55 C qsort with Python style compare: 0.53 NumPy newqsort: 0.34 eiger:/data1$ ./sort43_O3 Benchmark with 100 strings of size 15 C qsort with C style compare: 0.53 C qsort with Python style compare: 0.51 NumPy newqsort: 0.33 -- Scott M. RansomAddress: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: [EMAIL PROTECTED] Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Monday 11 February 2008, Charles R Harris escrigué: On Feb 11, 2008 1:15 PM, Francesc Altet [EMAIL PROTECTED] wrote: Here are the results of running it in several platforms: 1) My laptop: Ubuntu 7.1 (gcc 4.1.3, Pentium 4 @ 2 GHz) Benchmark with 100 strings of size 15 C qsort with C style compare: 2.45 C qsort with Python style compare: 2.44 NumPy newqsort: 0.65 Wow, what a difference. Yeah. This is why I got so excited initially. It's unfortunate that most of the speed-up in newqsort in this case is probably due to a possible flaw in qsort implementation for the combination Ubuntu/Pentium4. On the positive side, it is nice to see that other distros/processors have a decent performance on system qsort ;-) * I'd definitely keep memcpy by default. From my timings, it looks like the best option for all platforms. OK. Was that just for the copies, or was it for the swaps also? I ran a version of swap using memcpy on my machine and the sort was about half as fast for 8 character strings. No, only for the copies. For the swaps, this memcpy-based version: #define swap_string(s1, s2, len) { \ memcpy((vp), (s2), (len)); \ memcpy((s2), (s1), (len)); \ memcpy((s1), (vp), (len)); \ } performs extremely bad on my systems: Pentium4/Ubuntu 7.10: * newqsort with the loop version of swap_string: 0.65 s * newqsort with the memcpy version of swap_string: 9.14 s Opteron/SuSe LE 10.3: * newqsort with the loop version of swap_string: 0.59 s * newqsort with the memcpy version of swap_string: 8.71 s So, it seems that the nice newqsort performance is extremely dependent on the loop version of swap_string. I hope the benchmark will behave well in your platform too (i.e. newqsort will perform the best ;) I'll check it out when I get home. As I say, it was running about 10% slower on my machine, but if it does better on most platforms it is probably the way to go. We can always change it in the future when everyone is running on quantum computers. Quantum computers? Oh, I can't wait for mine ;-) -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
Hi, I have a Opteron 248 (2.66GHz) that with gcc 4.1.0 (SUSE10.1?) that gives C qsort with C style compare: 0.65 C qsort with Python style compare: 0.64 NumPy newqsort: 0.36 I did notice that -O3 was essential to get the performance gain as -O2 gave: C qsort with C style compare: 0.69 C qsort with Python style compare: 0.70 NumPy newqsort: 0.61 Bruce On Feb 12, 2008 10:07 AM, Francesc Altet [EMAIL PROTECTED] wrote: A Monday 11 February 2008, Charles R Harris escrigué: I'll check it out when I get home. As I say, it was running about 10% slower on my machine, but if it does better on most platforms it is probably the way to go. We can always change it in the future when everyone is running on quantum computers. We've done some testing on newqsort in several computers in our company. Here are the results for ordering a list with 1 million of strings of length 15 filled with random information (using C rand()): 1) Ubuntu 7.1 (gcc 4.1.3, -O3, Intel Pentium 4 @ 2 GHz) C qsort with C style compare: 2.45 C qsort with Python style compare: 2.44 NumPy newqsort: 0.65 2) Windows XP (SP2) (MSVC 7.1, /Ox, Intel Pentium 4 @ 2 GHz) C qsort with C style compare: 0.971000 C qsort with Python style compare: 0.962000 NumPy newqsort: 0.921000 3) SuSe LE 10.3 (gcc 4.2.1, -O3, AMD Opteron @ 2 GHz) C qsort with C style compare: 0.64 C qsort with Python style compare: 0.60 NumPy newqsort: 0.59 4) Debian 4.2.2 (lenny) (gcc 4.2.3, -O3, Intel Pentium 4 @ 3.2 GHz) C qsort with C style compare: 1.77 C qsort with Python style compare: 1.75 NumPy newqsort: 0.44 5) Mandriva 2008.0 (gcc 4.2.2, -O3, Intel Core2 Duo @ 1.5 GHz) C qsort with C style compare: 1.59 C qsort with Python style compare: 1.55 NumPy newqsort: 0.51 6) Ubuntu 7.1 (gcc 4.1.3, -O3, Intel Pentium 4 @ 2.5 GHz) C qsort with C style compare: 1.89 C qsort with Python style compare: 1.90 NumPy newqsort: 0.50 7) Ubuntu 7.1 (gcc 4.1.2, -O3, PowerPC 3 @ 1.3 GHz) C qsort with C style compare: 3.03 C qsort with Python style compare: 2.97 NumPy newqsort: 1.04 8) MacOSX 10.4 (Tiger) (gcc 4.0.1, -O3, PowerPC 3 @ 1.3 GHz) C qsort with C style compare: 1.56 C qsort with Python style compare: 1.51 NumPy newqsort: 1.22 All benchmarks have been run using the attached benchmark (if anybody else wants to join the fiesta, please report back your feedback). Summarizing, one can say a couple of things: * Recent Debian distros and derivatives (Ubuntu) as well as Mandriva are suffering from a innefficient system qsort (at least the implementation for strings). SuSe Linux Enterprise 10.3 seems to have solved this. And Windows XP (SP2) and MacOSX (Tiger) looks like they have a relatively efficient implementation of qsort. * The newqsort performs the best on all the platforms we have checked (ranging from a 5% of improvement on Opteron/SuSe, up to 3.8x with some Pentium4/Ubuntu systems). All in all, I'd also say that newqsort would be a good candidate to be put into NumPy. Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Feb 12, 2008 9:07 AM, Francesc Altet [EMAIL PROTECTED] wrote: A Monday 11 February 2008, Charles R Harris escrigué: I'll check it out when I get home. As I say, it was running about 10% slower on my machine, but if it does better on most platforms it is probably the way to go. We can always change it in the future when everyone is running on quantum computers. We've done some testing on newqsort in several computers in our company. Here are the results for ordering a list with 1 million of strings of length 15 filled with random information (using C rand()): 1) Ubuntu 7.1 (gcc 4.1.3, -O3, Intel Pentium 4 @ 2 GHz) C qsort with C style compare: 2.45 C qsort with Python style compare: 2.44 NumPy newqsort: 0.65 2) Windows XP (SP2) (MSVC 7.1, /Ox, Intel Pentium 4 @ 2 GHz) C qsort with C style compare: 0.971000 C qsort with Python style compare: 0.962000 NumPy newqsort: 0.921000 3) SuSe LE 10.3 (gcc 4.2.1, -O3, AMD Opteron @ 2 GHz) C qsort with C style compare: 0.64 C qsort with Python style compare: 0.60 NumPy newqsort: 0.59 4) Debian 4.2.2 (lenny) (gcc 4.2.3, -O3, Intel Pentium 4 @ 3.2 GHz) C qsort with C style compare: 1.77 C qsort with Python style compare: 1.75 NumPy newqsort: 0.44 5) Mandriva 2008.0 (gcc 4.2.2, -O3, Intel Core2 Duo @ 1.5 GHz) C qsort with C style compare: 1.59 C qsort with Python style compare: 1.55 NumPy newqsort: 0.51 6) Ubuntu 7.1 (gcc 4.1.3, -O3, Intel Pentium 4 @ 2.5 GHz) C qsort with C style compare: 1.89 C qsort with Python style compare: 1.90 NumPy newqsort: 0.50 7) Ubuntu 7.1 (gcc 4.1.2, -O3, PowerPC 3 @ 1.3 GHz) C qsort with C style compare: 3.03 C qsort with Python style compare: 2.97 NumPy newqsort: 1.04 8) MacOSX 10.4 (Tiger) (gcc 4.0.1, -O3, PowerPC 3 @ 1.3 GHz) C qsort with C style compare: 1.56 C qsort with Python style compare: 1.51 NumPy newqsort: 1.22 All benchmarks have been run using the attached benchmark (if anybody else wants to join the fiesta, please report back your feedback). Summarizing, one can say a couple of things: * Recent Debian distros and derivatives (Ubuntu) as well as Mandriva are suffering from a innefficient system qsort (at least the implementation for strings). SuSe Linux Enterprise 10.3 seems to have solved this. And Windows XP (SP2) and MacOSX (Tiger) looks like they have a relatively efficient implementation of qsort. * The newqsort performs the best on all the platforms we have checked (ranging from a 5% of improvement on Opteron/SuSe, up to 3.8x with some Pentium4/Ubuntu systems). The 3.8 is amazing, isn't it? I've found that the performance also depends on whether I initialize the strings with random or empty. With the random initialization the new sort is ~2x faster. That's fedora 8, core duo, 32 bit OS, gcc 4.1.2. All in all, I'd also say that newqsort would be a good candidate to be put into NumPy. I've merged some sorting tests preparatory to merging the new sorts. There is a release coming up this weekend, I don't know if it is tagged yet, but in any case I plan to merge the new sorts soon. Please help with the testing when I do. Now it's off to paying work. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
OK, The new quicksorts are in svn. Francesc, can you check them out? Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Monday 11 February 2008, Charles R Harris escrigué: I've attached my working _sortmodule.c.src file so you can fool with these different changes on your machines also. This is on top of current svn. Ok. In order to compare pears with pears, I've decided to create a standalone program in C (attached), based on your version (yes, it is almost the same that the one that I came up with). This also allows to run it quickly in as many platforms as possible. The compiler throws some warnings, but they are not important (I think). Here are the results of running it in several platforms: 1) My laptop: Ubuntu 7.1 (gcc 4.1.3, Pentium 4 @ 2 GHz) Benchmark with 100 strings of size 15 C qsort with C style compare: 2.45 C qsort with Python style compare: 2.44 NumPy newqsort: 0.65 2) My laptop: Windows XP (MSVC 7.1, Pentium 4 @ 2 GHz) Benchmark with 100 strings of size 15 C qsort with C style compare: 0.971000 C qsort with Python style compare: 0.962000 NumPy newqsort: 0.921000 3) An Opteron server: SuSe 10.1 (gcc 4.2.1, Opteron @ 2 GHz) Benchmark with 100 strings of size 15 C qsort with C style compare: 0.64 C qsort with Python style compare: 0.60 NumPy newqsort: 0.59 Some of the conclusions that can be drawn: * C qsort performs pretty badly on my Pentium4 laptop with Ubuntu * C qsort on Win on my laptop performs very similar to newqsort * newqsort performs much better on my Ubuntu Linux than in Windows * On Opteron, C qsort and newqsort do perform very similarly * and most importantly, newqsort runs faster in *all* platforms So, provided the last conclusion, I think it is safe to check newqsort in NumPy (unless something catastrofic might occur on other platforms). Finally, a couple of small things: * MSVC doesn't swallow the inline qualifier. So we should remove it and hope that most of NumPy installations will be compiled -O3 at least. * I'd definitely keep memcpy by default. From my timings, it looks like the best option for all platforms. I hope the benchmark will behave well in your platform too (i.e. newqsort will perform the best ;) Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - #include stdlib.h #include stdio.h #include time.h #define NUM 100 #define LEN 15 /* This cannot be inlined by MSVC compiler */ /* static int */ /* compare_stringC(const void *s1, const void *s2) */ /* { */ /* return strncmp(s1, s2, LEN); */ /* } */ static int compare_stringC(const void *s1, const void *s2) { const unsigned char *c1 = (unsigned char *)s1; const unsigned char *c2 = (unsigned char *)s2; size_t i; for(i = 0; i LEN; ++i) { if (c1[i] != c2[i]) { return (c1[i] c2[i]) ? 1 : -1; } if (c1[i] == 0) return 0; } return 0; } static int compare_stringP(const void *s1, const void *s2) { const unsigned char *c1 = (unsigned char *)s1; const unsigned char *c2 = (unsigned char *)s2; size_t i; for(i = 0; i LEN; ++i) { if (c1[i] != c2[i]) { return (c1[i] c2[i]) ? 1 : -1; } } return 0; } #define PYA_QS_STACK 100 #define SMALL_QUICKSORT 15 static int compare_string(char *s1, char *s2, size_t len) { const unsigned char *c1 = (unsigned char *)s1; const unsigned char *c2 = (unsigned char *)s2; size_t i; for(i = 0; i len; ++i) { if (c1[i] != c2[i]) { return (c1[i] c2[i]) ? 1 : -1; } } return 0; } #define STRING_LT(pa, pb, len) (compare_string(pa, pb, len) 0) static void swap_string(char *s1, char *s2, size_t len) { while(len--) { const char t = *s1; *s1++ = *s2; *s2++ = t; } } #define opt_memcpy(a,b,n) memcpy((a),(b),(n)) static void copy_string(char *s1, char *s2, size_t len) { while(len--) { *s1++ = *s2++; } } /* This seems to work faster on my laptop, but YMMV */ static void opt_memcpy2(void *a, const void *b, size_t n) { size_t i; if (n16) for (i=0; in; i++) ((char *)a)[i] = ((char *)b)[i]; else memcpy(a, b, n); } static int newqsort(char *start, size_t num, size_t len) { char *vp = malloc(len); char *pl = start; char *pr = start + (num - 1)*len; char *stack[PYA_QS_STACK], **sptr = stack, *pm, *pi, *pj, *pk; for(;;) { while ((pr - pl) 5*len) { /* quicksort partition */ pm = pl + (((pr - pl)/len) 1)*len; if (STRING_LT(pm, pl, len)) swap_string(pm, pl, len); if (STRING_LT(pr, pm, len)) swap_string(pr, pm, len); if (STRING_LT(pm, pl, len)) swap_string(pm, pl, len); opt_memcpy(vp, pm, len); pi = pl; pj = pr - len; swap_string(pm, pj, len); for(;;) { do pi += len; while (STRING_LT(pi, vp, len)); do pj -= len; while (STRING_LT(vp, pj, len)); if (pi = pj) break;
Re: [Numpy-discussion] String sort
On Feb 11, 2008 1:15 PM, Francesc Altet [EMAIL PROTECTED] wrote: A Monday 11 February 2008, Charles R Harris escrigué: I've attached my working _sortmodule.c.src file so you can fool with these different changes on your machines also. This is on top of current svn. Ok. In order to compare pears with pears, I've decided to create a standalone program in C (attached), based on your version (yes, it is almost the same that the one that I came up with). This also allows to run it quickly in as many platforms as possible. The compiler throws some warnings, but they are not important (I think). Here are the results of running it in several platforms: 1) My laptop: Ubuntu 7.1 (gcc 4.1.3, Pentium 4 @ 2 GHz) Benchmark with 100 strings of size 15 C qsort with C style compare: 2.45 C qsort with Python style compare: 2.44 NumPy newqsort: 0.65 Wow, what a difference. 2) My laptop: Windows XP (MSVC 7.1, Pentium 4 @ 2 GHz) Benchmark with 100 strings of size 15 C qsort with C style compare: 0.971000 C qsort with Python style compare: 0.962000 NumPy newqsort: 0.921000 3) An Opteron server: SuSe 10.1 (gcc 4.2.1, Opteron @ 2 GHz) Benchmark with 100 strings of size 15 C qsort with C style compare: 0.64 C qsort with Python style compare: 0.60 NumPy newqsort: 0.59 Some of the conclusions that can be drawn: * C qsort performs pretty badly on my Pentium4 laptop with Ubuntu * C qsort on Win on my laptop performs very similar to newqsort * newqsort performs much better on my Ubuntu Linux than in Windows * On Opteron, C qsort and newqsort do perform very similarly * and most importantly, newqsort runs faster in *all* platforms So, provided the last conclusion, I think it is safe to check newqsort in NumPy (unless something catastrofic might occur on other platforms). Finally, a couple of small things: * MSVC doesn't swallow the inline qualifier. So we should remove it and hope that most of NumPy installations will be compiled -O3 at least. I was afraid of that. The inline keyword is a fairly new standard; gcc has had it for a while but the older versions of MSVC didn't. I don't know if the newer MSVC versions do. IIRC, there was another way to get MSVC to inline. Of course, we could go to C++ :0) * I'd definitely keep memcpy by default. From my timings, it looks like the best option for all platforms. OK. Was that just for the copies, or was it for the swaps also? I ran a version of swap using memcpy on my machine and the sort was about half as fast for 8 character strings. I hope the benchmark will behave well in your platform too (i.e. newqsort will perform the best ;) I'll check it out when I get home. As I say, it was running about 10% slower on my machine, but if it does better on most platforms it is probably the way to go. We can always change it in the future when everyone is running on quantum computers. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Feb 11, 2008 4:06 AM, Francesc Altet [EMAIL PROTECTED] wrote: A Monday 11 February 2008, Francesc Altet escrigué: A Monday 11 February 2008, Charles R Harris escrigué: Mmm, comparing my new strncmp and the one that you have implemented in SVN, I've found a difference that can account for part of the difference in performances. With your version of strncmp in SVN (compare_string), these are my timings with the Opteron server: snip In [17]: np.random.seed(1) In [18]: a = np.random.rand(1).astype('S8') In [19]: %timeit a.copy().sort() 100 loops, best of 3: 3.86 ms per loop In [20]: %timeit newqsort(a.copy()) 100 loops, best of 3: 3.44 ms per loop which gives times a 5% worse. Try to use my version and tell me if it does better: static int inline opt_strncmp(char *a, char *b, size_t n) { size_t i; unsigned char c, d; for (i = 0; i n; i++) { c = a[i]; d = b[i]; if (c != d) return c - d; } return 0; } I didn't notice any speed difference. And while returning the difference of two unsigned numbers should work with modular arithmetic when it is cast to integer, I thought the explicit return based on a compare was clearer and safer. Comparisons always work. I've attached my working _sortmodule.c.src file so you can fool with these different changes on your machines also. This is on top of current svn. Chuck _sortmodule.c.src Description: WAIS Source ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Feb 11, 2008 2:58 AM, Francesc Altet [EMAIL PROTECTED] wrote: A Monday 11 February 2008, Charles R Harris escrigué: On Feb 8, 2008 5:29 AM, Francesc Altet [EMAIL PROTECTED] wrote: Hi, I'm a bit confused that the sort method of a string character doesn't allow a mergesort: s = numpy.empty(10, S10) s.sort(kind=merge) TypeError: desired sort not supported for this type However, by looking at the numpy sources, it seems that the only implemented method for sorting array strings is merge (I presume because it is stable). So, perhaps the message above should be fixed. The only available method is the default quicksort. The mergesort is for argsort and was put in for lexsort to use. That's good to know. However, I'm curious about where it is the specific quicksort implementation for strings/unicode. I've had a look at _sortmodule.c.src, but I can only find a quicksort implementation for: The default is the C qsort, it is called from PyArray_Sort in multiarraymodule.c. The type specific sorts, when they exist, are also called from there through _new_sort. To see what type specific sorts are registered, look at the end of _sortmodule.c.src. You can write a new sort, and if it isn't registered it won't be used. So commenting out the registration is a good way to get back to the default. The version you are testing is your own or the one that I provided? Here are the timings for my laptop: They turned out to be essentially identical, except I used len instead of ss ;) I also used inlined functions for the copy and swap as they are safer with the arguments. Comparison with the macro versions showed no difference there. In [55]: a = np.random.rand(1).astype('S8') In [56]: %timeit a.copy().sort() 100 loops, best of 3: 3.82 ms per loop In [57]: %timeit newqsort(a.copy()) 100 loops, best of 3: 3.29 ms per loop Here, the difference in performance has been reduced to a mere 15% (still favouring newqsort). So, with this, it seems like the performance of the original sorting in NumPy only suffers a lot when running in old processors (eg. Pentium 4), while the performance is reasonable with newer ones (Opteron). It could also depend on the C library that comes with the compiler. I'm running on a E6600, but numpy compiles everything for the i386, which might make a difference also. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Monday 11 February 2008, Francesc Altet escrigué: A Monday 11 February 2008, Charles R Harris escrigué: That's with the current sort(kind='q') in svn, which uses the new string compare function but is otherwise the old default quicksort. The new string specific version of quicksort I'm testing is actually a bit slower than that. Both versions correctly sort the array. So I'm going to continue to experiment a bit until I see what is going on. The version you are testing is your own or the one that I provided? Here are the timings for my laptop: In [32]: a = np.random.rand(1).astype('S8') In [33]: %timeit a.copy().sort() # original sort in NumPy 10 loops, best of 3: 16.8 ms per loop In [34]: %timeit newqsort(a.copy()) # My own qsort implementation 100 loops, best of 3: 4.29 ms per loop (I'm using a random string array here mainly because I use the sort in my system NumPy, with the old string compare. However, as all the contents in strings are not NULL chars, the performance should be comparable, bar a few percent of improvement). So, my newqsort still seems to run almost 4x faster than the one in NumPy (you know, using the old string compare). However, when using a server with an Opteron processor, the view is much different: In [55]: a = np.random.rand(1).astype('S8') In [56]: %timeit a.copy().sort() 100 loops, best of 3: 3.82 ms per loop In [57]: %timeit newqsort(a.copy()) 100 loops, best of 3: 3.29 ms per loop Here, the difference in performance has been reduced to a mere 15% (still favouring newqsort). So, with this, it seems like the performance of the original sorting in NumPy only suffers a lot when running in old processors (eg. Pentium 4), while the performance is reasonable with newer ones (Opteron). On its hand, newqsort seems to perform reasonably well in both. I don't know what exactly is the reason for this (I don't know where it is the code for the original quicksort for strings, so I can't do a visual comparison), but it would be great if we can discover it! Mmm, comparing my new strncmp and the one that you have implemented in SVN, I've found a difference that can account for part of the difference in performances. With your version of strncmp in SVN (compare_string), these are my timings with the Opteron server: In [17]: np.random.seed(1) In [18]: a = np.random.rand(1).astype('S8') In [19]: %timeit a.copy().sort() 100 loops, best of 3: 3.86 ms per loop In [20]: %timeit newqsort(a.copy()) 100 loops, best of 3: 3.44 ms per loop which gives times a 5% worse. Try to use my version and tell me if it does better: static int inline opt_strncmp(char *a, char *b, size_t n) { size_t i; unsigned char c, d; for (i = 0; i n; i++) { c = a[i]; d = b[i]; if (c != d) return c - d; } return 0; } Although a 5% is maybe too little improvement :-/ -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Monday 11 February 2008, Charles R Harris escrigué: On Feb 8, 2008 5:29 AM, Francesc Altet [EMAIL PROTECTED] wrote: Hi, I'm a bit confused that the sort method of a string character doesn't allow a mergesort: s = numpy.empty(10, S10) s.sort(kind=merge) TypeError: desired sort not supported for this type However, by looking at the numpy sources, it seems that the only implemented method for sorting array strings is merge (I presume because it is stable). So, perhaps the message above should be fixed. The only available method is the default quicksort. The mergesort is for argsort and was put in for lexsort to use. That's good to know. However, I'm curious about where it is the specific quicksort implementation for strings/unicode. I've had a look at _sortmodule.c.src, but I can only find a quicksort implementation for: /**begin repeat #TYPE=BOOL,BYTE,UBYTE,SHORT,USHORT,INT,UINT,LONG,ULONG,LONGLONG,ULONGLONG,FLOAT,DOUBLE,LONGDOUBLE,CFLOAT,CDOUBLE,CLONGDOUBLE# **/ Where are the STRING/UNICODE versions? Also, in the context of my work in indexing, and because of the slowness of the current implementation in NumPy, I've ended with an implementation of the quicksort method for 1-D array strings. For moderately large arrays, it is about 2.5x-3x faster than the (supposedly) mergesort version in NumPy, not only due to the quicksort, but also because I've implemented a couple of macros for efficient string swapping and copy. If this is of interest for NumPy developers, tell me and I will provide the code. I've now got a string/ucs4 specific argsort(kind='q'), the string version of which is about 40% faster than the old default and about 10% faster than the mergesort version, but the string/ucs4 specific versions of sort aren't yet fairing as well. I'm timing things with In [1]: import timeit In [2]: t = timeit.Timer(np.fromstring(np.empty(1).tostring(),dtype='|S8').s ort(kind='q'),import numpy as np) In [3]: t.repeat(3,100) Out[3]: [0.22127485275268555, 0.21282196044921875, 0.21273088455200195] That's with the current sort(kind='q') in svn, which uses the new string compare function but is otherwise the old default quicksort. The new string specific version of quicksort I'm testing is actually a bit slower than that. Both versions correctly sort the array. So I'm going to continue to experiment a bit until I see what is going on. The version you are testing is your own or the one that I provided? Here are the timings for my laptop: In [32]: a = np.random.rand(1).astype('S8') In [33]: %timeit a.copy().sort() # original sort in NumPy 10 loops, best of 3: 16.8 ms per loop In [34]: %timeit newqsort(a.copy()) # My own qsort implementation 100 loops, best of 3: 4.29 ms per loop (I'm using a random string array here mainly because I use the sort in my system NumPy, with the old string compare. However, as all the contents in strings are not NULL chars, the performance should be comparable, bar a few percent of improvement). So, my newqsort still seems to run almost 4x faster than the one in NumPy (you know, using the old string compare). However, when using a server with an Opteron processor, the view is much different: In [55]: a = np.random.rand(1).astype('S8') In [56]: %timeit a.copy().sort() 100 loops, best of 3: 3.82 ms per loop In [57]: %timeit newqsort(a.copy()) 100 loops, best of 3: 3.29 ms per loop Here, the difference in performance has been reduced to a mere 15% (still favouring newqsort). So, with this, it seems like the performance of the original sorting in NumPy only suffers a lot when running in old processors (eg. Pentium 4), while the performance is reasonable with newer ones (Opteron). On its hand, newqsort seems to perform reasonably well in both. I don't know what exactly is the reason for this (I don't know where it is the code for the original quicksort for strings, so I can't do a visual comparison), but it would be great if we can discover it! Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Feb 8, 2008 5:29 AM, Francesc Altet [EMAIL PROTECTED] wrote: Hi, I'm a bit confused that the sort method of a string character doesn't allow a mergesort: s = numpy.empty(10, S10) s.sort(kind=merge) TypeError: desired sort not supported for this type However, by looking at the numpy sources, it seems that the only implemented method for sorting array strings is merge (I presume because it is stable). So, perhaps the message above should be fixed. The only available method is the default quicksort. The mergesort is for argsort and was put in for lexsort to use. Also, in the context of my work in indexing, and because of the slowness of the current implementation in NumPy, I've ended with an implementation of the quicksort method for 1-D array strings. For moderately large arrays, it is about 2.5x-3x faster than the (supposedly) mergesort version in NumPy, not only due to the quicksort, but also because I've implemented a couple of macros for efficient string swapping and copy. If this is of interest for NumPy developers, tell me and I will provide the code. I've now got a string/ucs4 specific argsort(kind='q'), the string version of which is about 40% faster than the old default and about 10% faster than the mergesort version, but the string/ucs4 specific versions of sort aren't yet fairing as well. I'm timing things with In [1]: import timeit In [2]: t = timeit.Timer(np.fromstring(np.empty(1).tostring(),dtype='|S8').sort(kind='q'),import numpy as np) In [3]: t.repeat(3,100) Out[3]: [0.22127485275268555, 0.21282196044921875, 0.21273088455200195] That's with the current sort(kind='q') in svn, which uses the new string compare function but is otherwise the old default quicksort. The new string specific version of quicksort I'm testing is actually a bit slower than that. Both versions correctly sort the array. So I'm going to continue to experiment a bit until I see what is going on. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Friday 08 February 2008, Charles R Harris escrigué: On Feb 8, 2008 8:58 AM, Francesc Altet [EMAIL PROTECTED] wrote: A Friday 08 February 2008, Charles R Harris escrigué: Also, in the context of my work in indexing, and because of the slowness of the current implementation in NumPy, I've ended with an implementation of the quicksort method for 1-D array strings. For moderately large arrays, it is about 2.5x-3x faster than the (supposedly) mergesort version in NumPy, not only due to the quicksort, but also because I've implemented a couple of macros for efficient string swapping and copy. If this is of interest for NumPy developers, tell me and I will provide the code. I have some code for this too and was going to merge it. Send yours along and I'll get to it this weekend. Ok, great. I'm attaching it. Tell me if you need some clarification on the code. I ran a few timing tests. On my machine strncmp is about 100x faster than opt_strncmp, but sSWAP (with some fixes), is about 10x faster then using the memcpy in a recent compiler. Does this match with your experience. Well, I've run some more exhaustive tests on my laptop (Pentium 4 @ 2 GHz, Ubuntu 7.10, gcc 4.1.3, using -O3 optlevel) with the next sa1 array: numpy.random.seed(1) nelem = 1 a = numpy.random.rand(nelem) sa1 = a.astype('S16') And I've chosen the next benchmark: /* start: the start of data for sa1 array ss: the length of the string type (16) num: the number of elements in sa1 (1) */ int sort_S(char *start, int ss, npy_intp num) { char *pl = start; int a = 0; npy_intp i, j; for (i=0; inum; i++) for (j=0; jnum; j++) a += opt_strncmp(pl+i*ss, pl+j*ss, ss); return a; } So, when I choose the next implementation of opt_strncmp: static int inline opt_strncmp1(char *a, char *b, intp n) { intp i; for (i=0; in; i++) { if (a[i] b[i]) return i+1; if (a[i] b[i]) return -(i+1); } return 0; } I get a time of 1.70 s. When using the next implementation: static int inline opt_strncmp2(char *a, char *b, intp n) { intp i; for (i=0; in; i++) { if (a[i] != b[i]) return (((unsigned char *)a)[i] - ((unsigned char *)b)[i]); } return 0; } I get a time of 1.24 s. The original strncmp gives me 2.05 seconds with this setup. So, in short and using my laptop: original strncmp: 2.05 s opt_stncmp1: 1.70 s ; speed-up: 20% opt_stncmp2: 1.24 s ; speed-up: 65% Just in case, I've also run the benchmark in an Opteron server (2 GHz, SuSe 10.1, gcc 4.2.1, using -O3 optlevel). Here are the results: original strncmp: 1.25 s opt_stncmp1: 1.61 s ; speed-up: -30% opt_stncmp2: 1.03 s ; speed-up: 20% So, I was wrong: opt_stncmp2 is quite more efficient than opt_stncmp1, and definitely better than the original strncmp. So, I would say that using opt_stncmp2 is always the best bet. I'm not certain why are you seeing that original strncmp is 100x faster than opt_strncmpX :-/ Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Friday 08 February 2008, Charles R Harris escrigué: On Feb 8, 2008 10:31 AM, Francesc Altet [EMAIL PROTECTED] wrote: A Friday 08 February 2008, Francesc Altet escrigué: A Friday 08 February 2008, Charles R Harris escrigué: Also, in the context of my work in indexing, and because of the slowness of the current implementation in NumPy, I've ended with an implementation of the quicksort method for 1-D array strings. For moderately large arrays, it is about 2.5x-3x faster than the (supposedly) mergesort version in NumPy, not only due to the quicksort, but also because I've implemented a couple of macros for efficient string swapping and copy. If this is of interest for NumPy developers, tell me and I will provide the code. I have some code for this too and was going to merge it. Send yours along and I'll get to it this weekend. Ok, great. I'm attaching it. Tell me if you need some clarification on the code. Ops. I've introduced a last-minute problem in my code. To fix this, just replace the flawed opt_strncmp() that I sent before by: static int inline opt_strncmp(char *a, char *b, int n) { int i; for (i=0; in; i++) { if (a[i] b[i]) return i+1; if (a[i] b[i]) return -(i+1); /* Another way, but seems equivalent in speed, at least here */ /* if (a[i] != b[i]) */ /* return (((unsigned char *)a)[i] - ((unsigned char *)b)[i]); */ } return 0; } Apparently, this version works just fine. Did you find this significantly faster than strncmp? There is also a unicode compare, do you have thoughts about that? Well, for the unicode case it wouldn't be enough by replacing 'char' by 'Py_ArrayUCS4'? Maybe this afternoon I can do some benchmarking too in this regard. -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Feb 9, 2008 6:47 AM, Francesc Altet [EMAIL PROTECTED] wrote: A Friday 08 February 2008, Charles R Harris escrigué: On Feb 8, 2008 10:31 AM, Francesc Altet [EMAIL PROTECTED] wrote: A Friday 08 February 2008, Francesc Altet escrigué: A Friday 08 February 2008, Charles R Harris escrigué: Also, in the context of my work in indexing, and because of the slowness of the current implementation in NumPy, I've ended with an implementation of the quicksort method for 1-D array strings. For moderately large arrays, it is about 2.5x-3x faster than the (supposedly) mergesort version in NumPy, not only due to the quicksort, but also because I've implemented a couple of macros for efficient string swapping and copy. If this is of interest for NumPy developers, tell me and I will provide the code. I have some code for this too and was going to merge it. Send yours along and I'll get to it this weekend. Ok, great. I'm attaching it. Tell me if you need some clarification on the code. Ops. I've introduced a last-minute problem in my code. To fix this, just replace the flawed opt_strncmp() that I sent before by: static int inline opt_strncmp(char *a, char *b, int n) { int i; for (i=0; in; i++) { if (a[i] b[i]) return i+1; if (a[i] b[i]) return -(i+1); /* Another way, but seems equivalent in speed, at least here */ /* if (a[i] != b[i]) */ /* return (((unsigned char *)a)[i] - ((unsigned char *)b)[i]); */ } return 0; } Apparently, this version works just fine. Did you find this significantly faster than strncmp? There is also a unicode compare, do you have thoughts about that? Well, for the unicode case it wouldn't be enough by replacing 'char' by 'Py_ArrayUCS4'? Maybe this afternoon I can do some benchmarking too in this regard. Looks like that for Numpy. The problem I was thinking about is that for wide characters Windows C defaults to UTF16 while the Unixes default to UTF32. The C99 standard didn't specify the exact length, but Numpy seems to use (or assume) UTF32. Anyway, after doing some work to fool the optimizer and subtracting loop overhead, strncmp still comes out a bit faster for me, 11e-9 vs 16e-9 seconds to compare strings of length 10. I've attached the program. Note that on my machine malloc appears to return zeroed memory, so the string compares always go to the end. Chuck #include stdlib.h #include stdio.h #include string.h static int inline strncmp1(char *a, char *b, int n) { int i; for (i = 0; i n; i++) { if (a[i] != b[i]) return (((unsigned char *)a)[i] - ((unsigned char *)b)[i]); } return 0; } #define COMPARE(a, b, n) strncmp1(a,b,n) #define REP 100 #define LEN 10 int main(int argc, char** argv) { char (*a)[LEN] = malloc(REP*LEN); char (*b)[LEN] = malloc(REP*LEN); int sum = 0; int i; for(i = 0; i REP; ++i) sum += COMPARE(a[i], b[i], LEN); printf(%d\n, sum); free(a); free(b); return 1; } ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Saturday 09 February 2008, Charles R Harris escrigué: Well, for the unicode case it wouldn't be enough by replacing 'char' by 'Py_ArrayUCS4'? Maybe this afternoon I can do some benchmarking too in this regard. Looks like that for Numpy. The problem I was thinking about is that for wide characters Windows C defaults to UTF16 while the Unixes default to UTF32. If it were so simple ;-) The fact is that the Python crew is delivering the tarballs ready to compile with the UCS2 as default, and this applies to both UNIX and Windows. However, some Linux distributions (most in particular, Debian and derivatives), has chosen to make UCS4 the default in their Python packages. This is not a (big) problem in itself, but when it comes to writing arrays on disk and hope for portability (not only with different platforms, but also with different UCS python interpreter in the same machine!), we realized that this was a real problem (see discussion in [1]). So, NumPy had to make a decision in that regard, and Travis finally opted to only give support for the UCS4 charset in NumPy [2]. Also, he opened the door to possible UCS2 implementations in NumPy in the future, but that would be a real pain, IMHO. [1]http://projects.scipy.org/pipermail/numpy-discussion/2006-February/006081.html [2]http://projects.scipy.org/pipermail/numpy-discussion/2006-February/006130.html So, at least for the time being, you only have to worry about UCS4. The C99 standard didn't specify the exact length, but Numpy seems to use (or assume) UTF32. Well, I should say that UTF32 and UCS4 are names referring to the same thing, but most literature (and specially package configuration procedures) talks about UCS4. Anyway, after doing some work to fool the optimizer and subtracting loop overhead, strncmp still comes out a bit faster for me, 11e-9 vs 16e-9 seconds to compare strings of length 10. I've attached the program. Note that on my machine malloc appears to return zeroed memory, so the string compares always go to the end. I've seen the benchmark, and the problem is that C strncmp stops checking when it finds a \0 in the first string, while strncmp1 have to check the complete set of chars in strings. However, you won't really want to do C string comparisons with NumPy strings: In [35]: ns1 = numpy.array(as\0as) In [36]: ns2 = numpy.array(as\0bs) In [37]: ns1 == ns2 Out[37]: array(False, dtype=bool) In [38]: ns1 ns2 Out[38]: array(True, dtype=bool) or, with Python strings, in general: In [39]: ns1 = as\0as In [40]: ns2 = as\0bs In [41]: ns1 == ns2 Out[41]: False In [42]: ns1 ns2 Out[42]: True As you see, Python/NumPy strings are different beasts than C strings in that regard. The strings in the latter always end with a \0 (NULL) character, while in Python/NumPy the end is defined by a length property (btw, the same than in Pascal, if you know it). So, strncmp1 is not only faster than its C counterpart, but also the one doing the correct job with NumPy (unicode) strings. Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Feb 9, 2008 11:50 AM, Francesc Altet [EMAIL PROTECTED] wrote: A Saturday 09 February 2008, Charles R Harris escrigué: Well, for the unicode case it wouldn't be enough by replacing 'char' by 'Py_ArrayUCS4'? Maybe this afternoon I can do some benchmarking too in this regard. Looks like that for Numpy. The problem I was thinking about is that for wide characters Windows C defaults to UTF16 while the Unixes default to UTF32. If it were so simple ;-) The fact is that the Python crew is delivering the tarballs ready to compile with the UCS2 as default, and this applies to both UNIX and Windows. However, some Linux distributions (most in particular, Debian and derivatives), has chosen to make UCS4 the default in their Python packages. This is not a (big) problem in itself, but when it comes to writing arrays on disk and hope for portability (not only with different platforms, but also with different UCS python interpreter in the same machine!), we realized that this was a real problem (see discussion in [1]). So, NumPy had to make a decision in that regard, and Travis finally opted to only give support for the UCS4 charset in NumPy [2]. Also, he opened the door to possible UCS2 implementations in NumPy in the future, but that would be a real pain, IMHO. [1]http://projects.scipy.org/pipermail/numpy-discussion/2006-February/006081.html [2]http://projects.scipy.org/pipermail/numpy-discussion/2006-February/006130.html So, at least for the time being, you only have to worry about UCS4. The C99 standard didn't specify the exact length, but Numpy seems to use (or assume) UTF32. Well, I should say that UTF32 and UCS4 are names referring to the same thing, but most literature (and specially package configuration procedures) talks about UCS4. Anyway, after doing some work to fool the optimizer and subtracting loop overhead, strncmp still comes out a bit faster for me, 11e-9 vs 16e-9 seconds to compare strings of length 10. I've attached the program. Note that on my machine malloc appears to return zeroed memory, so the string compares always go to the end. I've seen the benchmark, and the problem is that C strncmp stops checking when it finds a \0 in the first string, while strncmp1 have to check the complete set of chars in strings. However, you won't really want to do C string comparisons with NumPy strings: In [35]: ns1 = numpy.array(as\0as) In [36]: ns2 = numpy.array(as\0bs) In [37]: ns1 == ns2 Out[37]: array(False, dtype=bool) In [38]: ns1 ns2 Out[38]: array(True, dtype=bool) or, with Python strings, in general: In [39]: ns1 = as\0as In [40]: ns2 = as\0bs In [41]: ns1 == ns2 Out[41]: False In [42]: ns1 ns2 Out[42]: True As you see, Python/NumPy strings are different beasts than C strings in that regard. The strings in the latter always end with a \0 (NULL) character, while in Python/NumPy the end is defined by a length property (btw, the same than in Pascal, if you know it). So, strncmp1 is not only faster than its C counterpart, but also the one doing the correct job with NumPy (unicode) strings. Ah, in that case the current indirect sort for NumPy strings, which uses strncmp, is incorrect and needs to be fixed. It seems that strings with zeros are not part of the current test series ;) Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Feb 9, 2008 2:07 PM, Francesc Altet [EMAIL PROTECTED] wrote: A Saturday 09 February 2008, Charles R Harris escrigué: So, strncmp1 is not only faster than its C counterpart, but also the one doing the correct job with NumPy (unicode) strings. Ah, in that case the current indirect sort for NumPy strings, which uses strncmp, is incorrect and needs to be fixed. It seems that strings with zeros are not part of the current test series ;) Yeah, that's right. And yes, it would be advisable to have at least a couple of tests having zeros interspersed throughout the string. Like this should do: In [5]: argsort(fromstring(\0\2\0\1, dtype=|S2)) Out[5]: array([0, 1]) Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Feb 9, 2008 2:29 PM, Francesc Altet [EMAIL PROTECTED] wrote: Chuck, One more thing on this. I've been doing some benchmarking with my opt_memcpy() macro in the quicksort_string function, and I should say that while it is definitely more efficient than my system memcpy for small values of n (the number of bytes to copy), this doesn't keep true for all values of n. For example, for n16, opt_memcpy() can be more than 4x faster than system memcpy (and this is why I naively thought that it would be faster in general). However, for n80, memcpy beats opt_memcpy between a 25% and 100% (depending on whether n is divisible by 2, 4 or 8). This is on my Linux system (Ubuntu 7.10), but perhaps with Windows the behaviour can be different. I think I would be able to come up with a routine that can offer a balance between opt_memcpy and system memcpy, but that should take some time. So, until I (or anybody else) do more research on this, I think it would be safer if you use system memcpy for string sorting in NumPy. The memcpy in newer compilers is actually pretty good. For integers and such it sometime compiles inline using integer assignments, but I was loath to make it the default implementation until = 4.1.x gcc became more common. However, strings might be a good place to use it. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Feb 9, 2008 2:42 PM, Charles R Harris [EMAIL PROTECTED] wrote: On Feb 9, 2008 2:29 PM, Francesc Altet [EMAIL PROTECTED] wrote: Chuck, One more thing on this. I've been doing some benchmarking with my opt_memcpy() macro in the quicksort_string function, and I should say that while it is definitely more efficient than my system memcpy for small values of n (the number of bytes to copy), this doesn't keep true for all values of n. For example, for n16, opt_memcpy() can be more than 4x faster than system memcpy (and this is why I naively thought that it would be faster in general). However, for n80, memcpy beats opt_memcpy between a 25% and 100% (depending on whether n is divisible by 2, 4 or 8). This is on my Linux system (Ubuntu 7.10), but perhaps with Windows the behaviour can be different. I think I would be able to come up with a routine that can offer a balance between opt_memcpy and system memcpy, but that should take some time. So, until I (or anybody else) do more research on this, I think it would be safer if you use system memcpy for string sorting in NumPy. The memcpy in newer compilers is actually pretty good. For integers and such it sometime compiles inline using integer assignments, but I was loath to make it the default implementation until = 4.1.x gcc became more common. However, strings might be a good place to use it. I'm also thinking that at some point it becomes more efficient to do a indirect sort followed by take than to move all those big strings around. But I guess we won't know where that point is until we have both versions available. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Saturday 09 February 2008, Charles R Harris escrigué: On Feb 9, 2008 2:07 PM, Francesc Altet [EMAIL PROTECTED] wrote: A Saturday 09 February 2008, Charles R Harris escrigué: So, strncmp1 is not only faster than its C counterpart, but also the one doing the correct job with NumPy (unicode) strings. Ah, in that case the current indirect sort for NumPy strings, which uses strncmp, is incorrect and needs to be fixed. It seems that strings with zeros are not part of the current test series ;) Yeah, that's right. And yes, it would be advisable to have at least a couple of tests having zeros interspersed throughout the string. Like this should do: In [5]: argsort(fromstring(\0\2\0\1, dtype=|S2)) Out[5]: array([0, 1]) Exactly, but I understand that the correct result should be: array([1, 0]) ;-) Something a bit more complex, like: In [5]: argsort(fromstring(a\0b\0\0\2a\0b\0\0\1, dtype=|S6)) Out[5]: array([1, 0]) wouldn't hurt neither. Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
Chuck, One more thing on this. I've been doing some benchmarking with my opt_memcpy() macro in the quicksort_string function, and I should say that while it is definitely more efficient than my system memcpy for small values of n (the number of bytes to copy), this doesn't keep true for all values of n. For example, for n16, opt_memcpy() can be more than 4x faster than system memcpy (and this is why I naively thought that it would be faster in general). However, for n80, memcpy beats opt_memcpy between a 25% and 100% (depending on whether n is divisible by 2, 4 or 8). This is on my Linux system (Ubuntu 7.10), but perhaps with Windows the behaviour can be different. I think I would be able to come up with a routine that can offer a balance between opt_memcpy and system memcpy, but that should take some time. So, until I (or anybody else) do more research on this, I think it would be safer if you use system memcpy for string sorting in NumPy. Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Saturday 09 February 2008, Charles R Harris escrigué: I'm also thinking that at some point it becomes more efficient to do a indirect sort followed by take than to move all those big strings around. But I guess we won't know where that point is until we have both versions available. I've done some experiments in that matter too. They are saying that, with the current mergesort in NumPy, an indirect sort followed by take performs similarly to direct sort for small string lengths (=16), but indirect sort starts to win afterwards. The version with quicksort and optimized sSWAP should be between 2x and 3x times faster than current mergesort implementation, so the advantage for direct sort could grow up to somewhere between 50 and 100. A nice idea could be doing some more toughful experiments in order to find the point where an indirect sort followed by a take would be more efficient and automatically select this method beyond that point. However, this has the drawback that you have to use additional memory for keeping the indices in the indirect method. Of course, when strings are large, those indices should take a rather negligible space compared with strings itself. In any case, in some situations where space is critical, this can still be important. I don't know, but my opinion is that we shouldn't take too agressive optimizations for that matter. My vote is to document this possibility in the docstrings, so that the user wanting for extreme performance can use this approach if he wants to. Still, for string sizes greater than, say, 1000, well, an automatic selection of the indirect method is very tempting indeed. Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Feb 9, 2008 3:19 PM, Francesc Altet [EMAIL PROTECTED] wrote: A Saturday 09 February 2008, Charles R Harris escrigué: I'm also thinking that at some point it becomes more efficient to do a indirect sort followed by take than to move all those big strings around. But I guess we won't know where that point is until we have both versions available. I've done some experiments in that matter too. They are saying that, with the current mergesort in NumPy, an indirect sort followed by take performs similarly to direct sort for small string lengths (=16), but indirect sort starts to win afterwards. The version with quicksort and optimized sSWAP should be between 2x and 3x times faster than current mergesort implementation, so the advantage for direct sort could grow up to somewhere between 50 and 100. A nice idea could be doing some more toughful experiments in order to find the point where an indirect sort followed by a take would be more efficient and automatically select this method beyond that point. However, this has the drawback that you have to use additional memory for keeping the indices in the indirect method. Of course, when strings are large, those indices should take a rather negligible space compared with strings itself. In any case, in some situations where space is critical, this can still be important. I don't know, but my opinion is that we shouldn't take too agressive optimizations for that matter. My vote is to document this possibility in the docstrings, so that the user wanting for extreme performance can use this approach if he wants to. Still, for string sizes greater than, say, 1000, well, an automatic selection of the indirect method is very tempting indeed. The strings with zeros problem runs deeper than it looked at first glance. Normal sorts don't work either, which means the type has a bad comparison function. And argsort still doesn't work even with the correct comparison function. Python, however, works as it should sorting lists of strings with zeros. So I'm going to have to track down and fix this oddity, but it is going to delay putting in the type specific quicksort for strings. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Feb 8, 2008 5:29 AM, Francesc Altet [EMAIL PROTECTED] wrote: Hi, I'm a bit confused that the sort method of a string character doesn't allow a mergesort: s = numpy.empty(10, S10) s.sort(kind=merge) TypeError: desired sort not supported for this type I think it's an error parsing the keyword. In fact, I thought I fixed that, but maybe I was waiting till I added the other methods. However, by looking at the numpy sources, it seems that the only implemented method for sorting array strings is merge (I presume because it is stable). So, perhaps the message above should be fixed. Also, in the context of my work in indexing, and because of the slowness of the current implementation in NumPy, I've ended with an implementation of the quicksort method for 1-D array strings. For moderately large arrays, it is about 2.5x-3x faster than the (supposedly) mergesort version in NumPy, not only due to the quicksort, but also because I've implemented a couple of macros for efficient string swapping and copy. If this is of interest for NumPy developers, tell me and I will provide the code. I have some code for this too and was going to merge it. Send yours along and I'll get to it this weekend. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Friday 08 February 2008, Charles R Harris escrigué: Also, in the context of my work in indexing, and because of the slowness of the current implementation in NumPy, I've ended with an implementation of the quicksort method for 1-D array strings. For moderately large arrays, it is about 2.5x-3x faster than the (supposedly) mergesort version in NumPy, not only due to the quicksort, but also because I've implemented a couple of macros for efficient string swapping and copy. If this is of interest for NumPy developers, tell me and I will provide the code. I have some code for this too and was going to merge it. Send yours along and I'll get to it this weekend. Ok, great. I'm attaching it. Tell me if you need some clarification on the code. Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - /* This is a quick sort implementation for numpy strings, mainly for testing purposes, but could be useful in the future. It happens to be between 2.5x and 3x (for long enough arrays) than the string sort in NumPy (based in merge sort). Francesc Altet 2008-02-07 */ #define sSWAP(a,b) { \ for (i=0; iss; i++) { \ ((char *)SWAP_temp)[i] = ((char *)(b))[i]; \ ((char *)b)[i] = ((char *)(a))[i]; \ ((char *)a)[i] = ((char *)(SWAP_temp))[i]; \ }\ } #define opt_memcpy(a,b,n) {\ for (i=0; i(n); i++) {\ ((char *)a)[i] = ((char *)(b))[i]; \ } \ } /* opt_strncmp is an optimized version of strncmp, mainly for easy the inlining by the compiler. I'm not sure whether the inline would create problems with other compilers than GCC, but it can safely removed and let the compiler decide if it should do the inlining or not. In any case, the inlining can improve the global performance by a 20% or more. This is only valid for regular strings, but adding support for UCS4 should be a matter of replacing 'char' by 'PyArray_UCS4', I think. */ static inline int opt_strncmp(char *a, char *b, int n) { int i; for (i=0; in; i++) { if (a[i] b[i]) return i; if (a[i] b[i]) return -i; } return 0; } /* start: the address where the data for 1-D string array starts ss: the size of the string elements. num: the number of elements */ int quicksort_string(char *start, int ss, intp num) { char *pl = start; char *pr = start + (num - 1) * ss; char *vp; char *SWAP_temp; char *stack[PYA_QS_STACK], **sptr = stack; char *pm, *pi, *pj, *pt; int i; vp = malloc(ss); SWAP_temp = malloc(ss); for(;;) { while (((pr - pl)/ss) SMALL_QUICKSORT) { /* quicksort partition */ pm = pl + (((pr-pl)/ss) 1)*ss; if (opt_strncmp(pm, pl, ss) 0) { sSWAP(pm, pl); } if (opt_strncmp(pr, pm, ss) 0) { sSWAP(pr, pm); } if (opt_strncmp(pm, pl, ss) 0) { sSWAP(pm, pl); } opt_memcpy(vp, pm, ss); pi = pl; pj = pr - ss; sSWAP(pm, pj); for(;;) { do { pi += ss; } while (opt_strncmp(pi, vp, ss) 0); do { pj -= ss; } while (opt_strncmp(vp, pj, ss) 0); if (pi = pj) break; sSWAP(pi, pj); } sSWAP(pi, pr-ss); /* push largest partition on stack */ if (pi - pl pr - pi) { *sptr++ = pi + ss; *sptr++ = pr; pr = pi - ss; }else{ *sptr++ = pl; *sptr++ = pi - ss; pl = pi + ss; } } /* insertion sort */ for(pi = pl + ss; pi = pr; pi += ss) { opt_memcpy(vp, pi, ss); for(pj = pi, pt = pi - ss; pj pl opt_strncmp(vp, pt, ss) 0;) { opt_memcpy(pj, pt, ss); pj -= ss; pt -= ss; } opt_memcpy(pj, vp, ss); } if (sptr == stack) break; pr = *(--sptr); pl = *(--sptr); } free(vp); free(SWAP_temp); return 0; } ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
A Friday 08 February 2008, Francesc Altet escrigué: A Friday 08 February 2008, Charles R Harris escrigué: Also, in the context of my work in indexing, and because of the slowness of the current implementation in NumPy, I've ended with an implementation of the quicksort method for 1-D array strings. For moderately large arrays, it is about 2.5x-3x faster than the (supposedly) mergesort version in NumPy, not only due to the quicksort, but also because I've implemented a couple of macros for efficient string swapping and copy. If this is of interest for NumPy developers, tell me and I will provide the code. I have some code for this too and was going to merge it. Send yours along and I'll get to it this weekend. Ok, great. I'm attaching it. Tell me if you need some clarification on the code. Ops. I've introduced a last-minute problem in my code. To fix this, just replace the flawed opt_strncmp() that I sent before by: static int inline opt_strncmp(char *a, char *b, int n) { int i; for (i=0; in; i++) { if (a[i] b[i]) return i+1; if (a[i] b[i]) return -(i+1); /* Another way, but seems equivalent in speed, at least here */ /* if (a[i] != b[i]) */ /* return (((unsigned char *)a)[i] - ((unsigned char *)b)[i]); */ } return 0; } Apparently, this version works just fine. Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Feb 8, 2008 8:58 AM, Francesc Altet [EMAIL PROTECTED] wrote: A Friday 08 February 2008, Charles R Harris escrigué: Also, in the context of my work in indexing, and because of the slowness of the current implementation in NumPy, I've ended with an implementation of the quicksort method for 1-D array strings. For moderately large arrays, it is about 2.5x-3x faster than the (supposedly) mergesort version in NumPy, not only due to the quicksort, but also because I've implemented a couple of macros for efficient string swapping and copy. If this is of interest for NumPy developers, tell me and I will provide the code. I have some code for this too and was going to merge it. Send yours along and I'll get to it this weekend. Ok, great. I'm attaching it. Tell me if you need some clarification on the code. I ran a few timing tests. On my machine strncmp is about 100x faster than opt_strncmp, but sSWAP (with some fixes), is about 10x faster then using the memcpy in a recent compiler. Does this match with your experience. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] String sort
On Feb 8, 2008 10:31 AM, Francesc Altet [EMAIL PROTECTED] wrote: A Friday 08 February 2008, Francesc Altet escrigué: A Friday 08 February 2008, Charles R Harris escrigué: Also, in the context of my work in indexing, and because of the slowness of the current implementation in NumPy, I've ended with an implementation of the quicksort method for 1-D array strings. For moderately large arrays, it is about 2.5x-3x faster than the (supposedly) mergesort version in NumPy, not only due to the quicksort, but also because I've implemented a couple of macros for efficient string swapping and copy. If this is of interest for NumPy developers, tell me and I will provide the code. I have some code for this too and was going to merge it. Send yours along and I'll get to it this weekend. Ok, great. I'm attaching it. Tell me if you need some clarification on the code. Ops. I've introduced a last-minute problem in my code. To fix this, just replace the flawed opt_strncmp() that I sent before by: static int inline opt_strncmp(char *a, char *b, int n) { int i; for (i=0; in; i++) { if (a[i] b[i]) return i+1; if (a[i] b[i]) return -(i+1); /* Another way, but seems equivalent in speed, at least here */ /* if (a[i] != b[i]) */ /* return (((unsigned char *)a)[i] - ((unsigned char *)b)[i]); */ } return 0; } Apparently, this version works just fine. Did you find this significantly faster than strncmp? There is also a unicode compare, do you have thoughts about that? Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] String sort
Hi, I'm a bit confused that the sort method of a string character doesn't allow a mergesort: s = numpy.empty(10, S10) s.sort(kind=merge) TypeError: desired sort not supported for this type However, by looking at the numpy sources, it seems that the only implemented method for sorting array strings is merge (I presume because it is stable). So, perhaps the message above should be fixed. Also, in the context of my work in indexing, and because of the slowness of the current implementation in NumPy, I've ended with an implementation of the quicksort method for 1-D array strings. For moderately large arrays, it is about 2.5x-3x faster than the (supposedly) mergesort version in NumPy, not only due to the quicksort, but also because I've implemented a couple of macros for efficient string swapping and copy. If this is of interest for NumPy developers, tell me and I will provide the code. Cheers, -- 0,0 Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data - ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion