Re: (Improved) Benchmark for Phobos Sort Algorithm
On 18/12/10 4:46 PM, BCS wrote: Hello Craig, It was brought to my attention that the quick sort has a very bad worst case, so I implemented a simple fix for it. Now the worst case (completely ordered) is the best case, and it only slows down the general case by a small percentage. I thought to myself, "it can't be this easy to fix quick sort". Does anyone see a flaw in this simple fix? Performs much better than Phobos in completely random and completely sorted data. Perhaps there is another case where it doesn't do as well? I think I've seen it stated as: If you know the implementation, you can always generate a pathological case for QSort. That's not true for a randomized pivot point (unless you also happen to know the PRNG... and seed).
Re: (Improved) Benchmark for Phobos Sort Algorithm
Hello Craig, It was brought to my attention that the quick sort has a very bad worst case, so I implemented a simple fix for it. Now the worst case (completely ordered) is the best case, and it only slows down the general case by a small percentage. I thought to myself, "it can't be this easy to fix quick sort". Does anyone see a flaw in this simple fix? Performs much better than Phobos in completely random and completely sorted data. Perhaps there is another case where it doesn't do as well? I think I've seen it stated as: If you know the implementation, you can always generate a pathological case for QSort.
Re: (Improved) Benchmark for Phobos Sort Algorithm
On Fri, 17 Dec 2010 03:05:02 + Russel Winder wrote: > On Thu, 2010-12-16 at 20:36 -0600, Craig Black wrote: > > It was brought to my attention that the quick sort has a very bad worst > > case, so I implemented a simple fix for it. Now the worst case (completely > > ordered) is the best case, and it only slows down the general case by a > > small percentage. I thought to myself, "it can't be this easy to fix quick > > sort". Does anyone see a flaw in this simple fix? Performs much better > > than Phobos in completely random and completely sorted data. Perhaps there > > is another case where it doesn't do as well? > > Is there any reason to not just follow Bentley and McIlroy, > ``Engineering a Sort Function,'' SP&E 23(11), p.1249-1265, November > 1993. It is what the Java folk and the Go folk do for sorting arrays > (and slices in Go). The Java folk use a modified Merge Sort for sorting > collections. It's all to do with stability as well as algorithmic > complexity. > > Quicksort and Merge Sort is, however, a research industry so it will > undoubtedly be the case that there is significantly more work done in > the last 17 years. This is especially true for parallel sorting. A > library for D undoubtedly needs both a sequential and a parallel sort > function. The Go folk haven't tackled this yet, and I can#t see the C++ > and Java folk tackling it for the forseeable future even though it is > basically a necessity. > > I have no doubt that people on this list could easily contribute to the > research activity in this area, and perhaps that is what some would like > to do, but to tinker away at algorithms outside the context of all the > research work done on this seems like the fastest way to be treated as > amateur hackers. > What about TimSort? http://en.wikipedia.org/wiki/Timsort (Was also considered to replace QuickSort in Lua.) Denis -- -- -- -- -- -- -- vit esse estrany ☣ spir.wikidot.com
Re: (Improved) Benchmark for Phobos Sort Algorithm
On 12/16/10 9:05 PM, Russel Winder wrote: On Thu, 2010-12-16 at 20:36 -0600, Craig Black wrote: It was brought to my attention that the quick sort has a very bad worst case, so I implemented a simple fix for it. Now the worst case (completely ordered) is the best case, and it only slows down the general case by a small percentage. I thought to myself, "it can't be this easy to fix quick sort". Does anyone see a flaw in this simple fix? Performs much better than Phobos in completely random and completely sorted data. Perhaps there is another case where it doesn't do as well? Is there any reason to not just follow Bentley and McIlroy, ``Engineering a Sort Function,'' SP&E 23(11), p.1249-1265, November 1993. It is what the Java folk and the Go folk do for sorting arrays (and slices in Go). The Java folk use a modified Merge Sort for sorting collections. It's all to do with stability as well as algorithmic complexity. Quicksort and Merge Sort is, however, a research industry so it will undoubtedly be the case that there is significantly more work done in the last 17 years. This is especially true for parallel sorting. A library for D undoubtedly needs both a sequential and a parallel sort function. The Go folk haven't tackled this yet, and I can#t see the C++ and Java folk tackling it for the forseeable future even though it is basically a necessity. I have no doubt that people on this list could easily contribute to the research activity in this area, and perhaps that is what some would like to do, but to tinker away at algorithms outside the context of all the research work done on this seems like the fastest way to be treated as amateur hackers. Yeah, when reading this I was like, "the last sentence ain't likely to be as well received as others". :o) All - let's take it easy. I implemented std.algorithm sort and it reuses partition(), another algorithm, and uses Singleton's partition of first, middle, last element. I also eliminated a few obvious risks of quadratic behavior. See comment on line 3831: http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/algorithm.d?rev=1279#L3808 I was familiar at the time with Bentley's paper but there is only so much time to spend on implementing one algorithm when I had fifty others on my plate. I think std.algorithm.sort does an adequate job but it can be improved in many ways. Andrei
Re: (Improved) Benchmark for Phobos Sort Algorithm
I've found a Java implementation of introsort: http://ralphunden.net/?q=a-guide-to-introsort http://ralphunden.net/?q=a-guide-to-introsort#42 Hope that helps. :)
Re: (Improved) Benchmark for Phobos Sort Algorithm
Craig Black schrieb: Amateur hacker? Ah, go fuck yourself. Just because I haven't researched sorting algorithms before doesn't give you any right to talk down to me. I haven't been ignoring research... but I do like to tinker. For me it's a good way to learn. In addition to tinkering I have been learning about other sort algorithms. Again, please fuck yourself. -Craig WTF are you drunk or something?
Re: (Improved) Benchmark for Phobos Sort Algorithm
"Matthias Walter" wrote in message news:mailman.1065.1292557052.21107.digitalmar...@puremagic.com... On 12/16/2010 09:36 PM, Craig Black wrote: It was brought to my attention that the quick sort has a very bad worst case, so I implemented a simple fix for it. Now the worst case (completely ordered) is the best case, and it only slows down the general case by a small percentage. I thought to myself, "it can't be this easy to fix quick sort". Does anyone see a flaw in this simple fix? Performs much better than Phobos in completely random and completely sorted data. Perhaps there is another case where it doesn't do as well? Yes, there is a "flaw": There are still instances of arrays where you will end up with a pivot element being one of the largest or one of the smallest elements in *every* call. The means, that you split your array from length n not into two arrays roughly of size n/2 and n/2, but of O(1) and n - O(1). This implies a running time of n^2 (in contrast to n log n), which is obviously bad. I don't know how std.algorithm.sort works, but C++ STL uses an Introspective sort, which is a quick-sort variant like you have, but it has some measurements that observe whether the above worst case occurs (e.g. by looking at the recursion depth) and switches to a heap-sort in this case. [1] Matthias [1] http://en.wikipedia.org/wiki/Introsort Thanks for the advice! I have been looking on the internet and it seems introsort is the best, but I haven't found any free C/C++ code for it. -Craig
Re: (Improved) Benchmark for Phobos Sort Algorithm
Amateur hacker? Ah, go fuck yourself. Just because I haven't researched sorting algorithms before doesn't give you any right to talk down to me. I haven't been ignoring research... but I do like to tinker. For me it's a good way to learn. In addition to tinkering I have been learning about other sort algorithms. Again, please fuck yourself. -Craig
Re: (Improved) Benchmark for Phobos Sort Algorithm
On 12/16/2010 09:36 PM, Craig Black wrote: > It was brought to my attention that the quick sort has a very bad > worst case, so I implemented a simple fix for it. Now the worst case > (completely ordered) is the best case, and it only slows down the > general case by a small percentage. I thought to myself, "it can't be > this easy to fix quick sort". Does anyone see a flaw in this simple > fix? Performs much better than Phobos in completely random and > completely sorted data. Perhaps there is another case where it > doesn't do as well? Yes, there is a "flaw": There are still instances of arrays where you will end up with a pivot element being one of the largest or one of the smallest elements in *every* call. The means, that you split your array from length n not into two arrays roughly of size n/2 and n/2, but of O(1) and n - O(1). This implies a running time of n^2 (in contrast to n log n), which is obviously bad. I don't know how std.algorithm.sort works, but C++ STL uses an Introspective sort, which is a quick-sort variant like you have, but it has some measurements that observe whether the above worst case occurs (e.g. by looking at the recursion depth) and switches to a heap-sort in this case. [1] Matthias [1] http://en.wikipedia.org/wiki/Introsort
Re: (Improved) Benchmark for Phobos Sort Algorithm
On Thu, 2010-12-16 at 20:36 -0600, Craig Black wrote: > It was brought to my attention that the quick sort has a very bad worst > case, so I implemented a simple fix for it. Now the worst case (completely > ordered) is the best case, and it only slows down the general case by a > small percentage. I thought to myself, "it can't be this easy to fix quick > sort". Does anyone see a flaw in this simple fix? Performs much better > than Phobos in completely random and completely sorted data. Perhaps there > is another case where it doesn't do as well? Is there any reason to not just follow Bentley and McIlroy, ``Engineering a Sort Function,'' SP&E 23(11), p.1249-1265, November 1993. It is what the Java folk and the Go folk do for sorting arrays (and slices in Go). The Java folk use a modified Merge Sort for sorting collections. It's all to do with stability as well as algorithmic complexity. Quicksort and Merge Sort is, however, a research industry so it will undoubtedly be the case that there is significantly more work done in the last 17 years. This is especially true for parallel sorting. A library for D undoubtedly needs both a sequential and a parallel sort function. The Go folk haven't tackled this yet, and I can#t see the C++ and Java folk tackling it for the forseeable future even though it is basically a necessity. I have no doubt that people on this list could easily contribute to the research activity in this area, and perhaps that is what some would like to do, but to tinker away at algorithms outside the context of all the research work done on this seems like the fastest way to be treated as amateur hackers. -- Russel. = Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.win...@ekiga.net 41 Buckmaster Roadm: +44 7770 465 077 xmpp: rus...@russel.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder signature.asc Description: This is a digitally signed message part
(Improved) Benchmark for Phobos Sort Algorithm
It was brought to my attention that the quick sort has a very bad worst case, so I implemented a simple fix for it. Now the worst case (completely ordered) is the best case, and it only slows down the general case by a small percentage. I thought to myself, "it can't be this easy to fix quick sort". Does anyone see a flaw in this simple fix? Performs much better than Phobos in completely random and completely sorted data. Perhaps there is another case where it doesn't do as well? -Craig import std.stdio; import std.random; import std.algorithm; static bool less(T)(T a, T b) { return a < b; } bool isOrdered(A, alias L)(A a, int low, int high) { for(int i = low; i < high; i++) { if(L(a[i+1], a[i])) return false; } return true; } void insertionSort(A, alias L)(A a, int low, int high) { for(int i = low; i <= high; i++) { int min = i; for(int j = i + 1; j <= high; j++) if(L(a[j], a[min])) min = j; swap(a[i], a[min]); } } void quickSort(A, alias L)(A a, int p, int r) { if (p >= r) return; if(isOrdered!(A, L)(a, p, r)) return; if(p + 7 > r) return insertionSort!(A, L)(a, p, r); auto x = a[r]; int j = p - 1; for (int i = p; i < r; i++) { if (L(x, a[i])) continue; swap(a[i], a[++j]); } a[r] = a[j + 1]; a[j + 1] = x; quickSort!(A, L)(a, p, j); quickSort!(A, L)(a, j + 2, r); } void customSort(T)(T[] a) { quickSort!(T[], less!T)(a, 0, a.length-1); } ulong getCycle() { asm { rdtsc; } } ulong bench1(double[] vals) { ulong startTime = getCycle(); double[] v; v.length = vals.length; for(int i = 0; i < 100; i++) { for(int j = 0; j < v.length; j++) v[j] = vals[j]; sort(v); } return getCycle() - startTime; } ulong bench2(double[] vals) { ulong startTime = getCycle(); double[] v; v.length = vals.length; for(int i = 0; i < 100; i++) { for(int j = 0; j < v.length; j++) v[j] = vals[j]; customSort(v); } return getCycle() - startTime; } void main() { Mt19937 gen; double[] vals; vals.length = 1000; for(int i = 0; i < vals.length; i++) vals[i] = uniform(0.0,1000.0); sort(vals[]); ulong time1, time2; for(int i = 0; i < 100; i++) { time1 += bench1(vals); time2 += bench2(vals); } writeln("Sorting with phobos sort: ", time1/1e5); writeln("Sorting with custom quickSort: ", time2/1e5); if(time1 > time2) writeln(100.0 * (time1-time2) / time1, " percent faster"); else writeln(100.0 * (time2-time1) / time2, " percent slower"); }