Re: qsort() documentation
On 2016-Apr-20 08:45:00 +0200, Hans Petter Selasky wrote: >There is something which I don't understand. Why is quicksort falling >back to insertion sort which is an O(N**2) algorithm, when there exist a >O(log(N)*log(N)*N) algorithms, which I propose as a solution to the >"bad" characteristics of qsort. O() notation just describes the (normally, worst case) ratio of input size to runtime for a given algorithm: Increasing the input size by (say) 100× means an insertion sort will take about 1× as long to run, whilst the "best" algorithms would take about 2000× as long. It says nothing about how fast sorting (say) 1000 items takes with either sort or how they behave on "typical" inputs. In general, the fancier algorithms might have better worst-case O() numbers but they have higher overheads and may not perform any better on typical inputs - so, for small inputs, insertion sort or bubble sort may be faster. IMO: - If you're only sorting a small number of items and/or doing it infrequently, the sort performance doesn't really matter and you can use any algorithm. - If you're sorting lots of items and sort performance is a real issue, you need to examine the performance of a variety of algorithms on your input data and may need to roll your own implementation. As long as qsort() behaves reasonably and its behaviour is documented sufficiently well that someone can decide whether or not to rule it out for their specific application, that is (IMHO) sufficient. -- Peter Jeremy signature.asc Description: PGP signature
Re: qsort() documentation
Quoting Warren Block : On Tue, 19 Apr 2016, Aleksander Alekseev wrote: Why Wikipedia, specifically? There are a lot of places that describe quicksort. How about just Note: This implementation of qsort() is designed to avoid the worst-case complexity of N**2 that is often seen with standard versions. I would say that this statement is just false. Worst-case complexity is still N**2. How about something like: """ This implementation of qsort() has worst case complexity of N**2. However measures were taken that make it very unlikely that for some random input N**2 swaps will be made. It's still possible to generate such an input on purpose though. See code below for more details. """ Okay: The quicksort algorithm worst-case is O(N**2), but this implementation has been designed to avoid that for most real data. Keep in mind that there is no requirement whatsoever that the qsort() function uses the quicksort algorithm, even if that is a common way to implement qsort() Also, worst-case is worst-case, no matter how unlikely. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: qsort() documentation
On 04/20/16 06:01, Warren Block wrote: On Tue, 19 Apr 2016, Aleksander Alekseev wrote: Why Wikipedia, specifically? There are a lot of places that describe quicksort. How about just Note: This implementation of qsort() is designed to avoid the worst-case complexity of N**2 that is often seen with standard versions. I would say that this statement is just false. Worst-case complexity is still N**2. How about something like: """ This implementation of qsort() has worst case complexity of N**2. However measures were taken that make it very unlikely that for some random input N**2 swaps will be made. It's still possible to generate such an input on purpose though. See code below for more details. """ Okay: The quicksort algorithm worst-case is O(N**2), but this implementation has been designed to avoid that for most real data. Hi, There is something which I don't understand. Why is quicksort falling back to insertion sort which is an O(N**2) algorithm, when there exist a O(log(N)*log(N)*N) algorithms, which I propose as a solution to the "bad" characteristics of qsort. The replacement algorithm I propose does not allocate working memory and it does not recurse and it does not use a significant amount of stack space. Maybe some number theorists want to have a look? I cannot seem to find it anywhere public. See here: https://reviews.freebsd.org/D5241 Local test results w/D5241 running statically in userspace: Data set size log2(N) qsort w/insert sort qsort w/hpsort mergesort data set 10 1.281.121.34random0 11 2.422.432.83random0 12 5.215.2 6.1 random0 10 1.261.141.44random1 11 2.462.463.05random1 12 5.285.296.56random1 10 10.01 5.1 0.2 bad0 11 39.12 12.11 0.33bad0 12 156.54 28.42 0.58bad0 New algorithm is in the middle column. Times are in seconds. Bad0 data set: http://calmerthanyouare.org/2014/06/11/algorithmic-complexity-attacks-and-libc-qsort.html --HPS ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: qsort() documentation
On Tue, 19 Apr 2016, Aleksander Alekseev wrote: Why Wikipedia, specifically? There are a lot of places that describe quicksort. How about just Note: This implementation of qsort() is designed to avoid the worst-case complexity of N**2 that is often seen with standard versions. I would say that this statement is just false. Worst-case complexity is still N**2. How about something like: """ This implementation of qsort() has worst case complexity of N**2. However measures were taken that make it very unlikely that for some random input N**2 swaps will be made. It's still possible to generate such an input on purpose though. See code below for more details. """ Okay: The quicksort algorithm worst-case is O(N**2), but this implementation has been designed to avoid that for most real data. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: qsort() documentation
> Why Wikipedia, specifically? There are a lot of places that describe > quicksort. How about just > >Note: This implementation of qsort() is designed to avoid the >worst-case complexity of N**2 that is often seen with standard >versions. I would say that this statement is just false. Worst-case complexity is still N**2. How about something like: """ This implementation of qsort() has worst case complexity of N**2. However measures were taken that make it very unlikely that for some random input N**2 swaps will be made. It's still possible to generate such an input on purpose though. See code below for more details. """ -- Best regards, Aleksander Alekseev http://eax.me/ ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: qsort() documentation
On Mon, 18 Apr 2016, Hans Petter Selasky wrote: Hi, Are there any objections adding the following as part of documenting our kernel's qsort function? Index: sys/libkern/qsort.c === --- sys/libkern/qsort.c (revision 298202) +++ sys/libkern/qsort.c (working copy) @@ -45,6 +45,10 @@ /* * Qsort routine from Bentley & McIlroy's "Engineering a Sort Function". + * + * NOTE: This implementation of qsort() was designed to not have the "was designed to avoid the" + * worst case complexity of N**2, as seen with the regular quick sort + * functions as described by Wikipedia. */ Why Wikipedia, specifically? There are a lot of places that describe quicksort. How about just Note: This implementation of qsort() is designed to avoid the worst-case complexity of N**2 that is often seen with standard versions. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: qsort() documentation
On Mon, Apr 18, 2016 at 12:13 PM, Hans Petter Selasky wrote: > Did anyone try to generate such a fiendish set of data, and see how > quadratic the FreeBSD's qsort() becomes? > Not me, but it has been done: http://calmerthanyouare.org/2014/06/11/algorithmic-complexity-attacks-and-libc-qsort.html ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: qsort() documentation
On 04/18/16 16:49, Ed Schouten wrote: 2016-04-18 15:09 GMT+02:00 Hans Petter Selasky : On 04/18/16 14:16, Aleksander Alekseev wrote: I suggest also add a short description of how it was achieved (randomization?). I think the algorithm is switching to mergesort. I'll look up the paper and add that correctly before commit. As a Dutch person, I know the answer to this. Instead of picking a fixed pivot or choosing one at random, it uses an approach called linear time median finding to find a pivot that is 'approximately median'. There are a couple of algorithms for this, but I think Bentley's qsort() uses this: https://en.wikipedia.org/wiki/Dutch_national_flag_problem Hi, Ryan: Yes, there is quadratic behaviour still, but I believe the order is limited. For the matter of the topic I added a counter for the swap() code in the insertion fallback algorithm, and for a set of 2048 integers I never saw the swap() count exceed this number. For a pre-sorted array, values around ~2047 and reverse sorted ~2043. For random input far less. Citing the document "bentley93engineering.pdf", a footnote says: Of course, quadratic behavior is still possible. One can generate fiendish inputs by bugging Quicksort: Con- sider key values to be unknown initially. In the code for selecting a partition element, assign values in increas- ing order as unknown keys are encountered. In the partitioning code, make unknown keys compare high. Did anyone try to generate such a fiendish set of data, and see how quadratic the FreeBSD's qsort() becomes? Another thread, possibly related: http://postgresql.nabble.com/Why-do-we-still-perform-a-check-for-pre-sorted-input-within-qsort-variants-td5746526.html --HPS ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: qsort() documentation
2016-04-18 15:09 GMT+02:00 Hans Petter Selasky : > On 04/18/16 14:16, Aleksander Alekseev wrote: >> I suggest also add a short description of how it was achieved >> (randomization?). > > I think the algorithm is switching to mergesort. I'll look up the paper and > add that correctly before commit. As a Dutch person, I know the answer to this. Instead of picking a fixed pivot or choosing one at random, it uses an approach called linear time median finding to find a pivot that is 'approximately median'. There are a couple of algorithms for this, but I think Bentley's qsort() uses this: https://en.wikipedia.org/wiki/Dutch_national_flag_problem -- Ed Schouten Nuxi, 's-Hertogenbosch, the Netherlands KvK-nr.: 62051717 ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: qsort() documentation
On Mon, Apr 18, 2016 at 9:09 AM, Hans Petter Selasky wrote > I think the algorithm is switching to mergesort. I'll look up the paper > and add that correctly before commit. > No, it switches to insertion sort, assuming that it's acting on an already sorted array. If that assumption is wrong you still get O(n**2) complexity. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: qsort() documentation
On 04/18/16 14:16, Aleksander Alekseev wrote: I suggest also add a short description of how it was achieved (randomization?). I think the algorithm is switching to mergesort. I'll look up the paper and add that correctly before commit. --HPS ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: qsort() documentation
Hello. I suggest also add a short description of how it was achieved (randomization?). On Mon, 18 Apr 2016 13:43:38 +0200 Hans Petter Selasky wrote: > Hi, > > Are there any objections adding the following as part of documenting > our kernel's qsort function? > > Index: sys/libkern/qsort.c > === > --- sys/libkern/qsort.c (revision 298202) > +++ sys/libkern/qsort.c (working copy) > @@ -45,6 +45,10 @@ > > /* >* Qsort routine from Bentley & McIlroy's "Engineering a Sort > Function". > + * > + * NOTE: This implementation of qsort() was designed to not have the > + * worst case complexity of N**2, as seen with the regular quick sort > + * functions as described by Wikipedia. >*/ > > > --HPS > ___ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to > "freebsd-current-unsubscr...@freebsd.org" > -- Best regards, Aleksander Alekseev http://eax.me/ ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
qsort() documentation
Hi, Are there any objections adding the following as part of documenting our kernel's qsort function? Index: sys/libkern/qsort.c === --- sys/libkern/qsort.c (revision 298202) +++ sys/libkern/qsort.c (working copy) @@ -45,6 +45,10 @@ /* * Qsort routine from Bentley & McIlroy's "Engineering a Sort Function". + * + * NOTE: This implementation of qsort() was designed to not have the + * worst case complexity of N**2, as seen with the regular quick sort + * functions as described by Wikipedia. */ --HPS ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"