Re: qsort() documentation

2016-04-20 Thread Peter Jeremy
On 2016-Apr-20 08:45:00 +0200, Hans Petter Selasky  wrote:
>There is something which I don't understand. Why is quicksort falling 
>back to insertion sort which is an O(N**2) algorithm, when there exist a 
>O(log(N)*log(N)*N) algorithms, which I propose as a solution to the 
>"bad" characteristics of qsort.

O() notation just describes the (normally, worst case) ratio of input size
to runtime for a given algorithm: Increasing the input size by (say) 100×
means an insertion sort will take about 1× as long to run, whilst the
"best" algorithms would take about 2000× as long.  It says nothing about how
fast sorting (say) 1000 items takes with either sort or how they behave on
"typical" inputs.  In general, the fancier algorithms might have better
worst-case O() numbers but they have higher overheads and may not perform
any better on typical inputs - so, for small inputs, insertion sort or
bubble sort may be faster.

IMO:
- If you're only sorting a small number of items and/or doing it infrequently,
  the sort performance doesn't really matter and you can use any algorithm.
- If you're sorting lots of items and sort performance is a real issue, you
  need to examine the performance of a variety of algorithms on your input
  data and may need to roll your own implementation.

As long as qsort() behaves reasonably and its behaviour is documented
sufficiently well that someone can decide whether or not to rule it out
for their specific application, that is (IMHO) sufficient.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: qsort() documentation

2016-04-20 Thread Erik Trulsson

Quoting Warren Block :


On Tue, 19 Apr 2016, Aleksander Alekseev wrote:


Why Wikipedia, specifically?  There are a lot of places that describe
quicksort.  How about just

  Note: This implementation of qsort() is designed to avoid the
  worst-case complexity of N**2 that is often seen with standard
  versions.


I would say that this statement is just false. Worst-case complexity is
still N**2. How about something like:

"""
This implementation of qsort() has worst case complexity of N**2.
However measures were taken that make it very unlikely that for some
random input N**2 swaps will be made. It's still possible to generate
such an input on purpose though. See code below for more details.
"""


Okay:

  The quicksort algorithm worst-case is O(N**2), but this implementation
  has been designed to avoid that for most real data.


Keep in mind that there is no requirement whatsoever that the qsort()
function uses the quicksort algorithm, even if that is a common way to  
implement qsort()


Also, worst-case is worst-case, no matter how unlikely.




___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: qsort() documentation

2016-04-20 Thread Hans Petter Selasky

On 04/20/16 06:01, Warren Block wrote:

On Tue, 19 Apr 2016, Aleksander Alekseev wrote:


Why Wikipedia, specifically?  There are a lot of places that describe
quicksort.  How about just

   Note: This implementation of qsort() is designed to avoid the
   worst-case complexity of N**2 that is often seen with standard
   versions.


I would say that this statement is just false. Worst-case complexity is
still N**2. How about something like:

"""
This implementation of qsort() has worst case complexity of N**2.
However measures were taken that make it very unlikely that for some
random input N**2 swaps will be made. It's still possible to generate
such an input on purpose though. See code below for more details.
"""


Okay:

   The quicksort algorithm worst-case is O(N**2), but this implementation
   has been designed to avoid that for most real data.


Hi,

There is something which I don't understand. Why is quicksort falling 
back to insertion sort which is an O(N**2) algorithm, when there exist a 
O(log(N)*log(N)*N) algorithms, which I propose as a solution to the 
"bad" characteristics of qsort.


The replacement algorithm I propose does not allocate working memory and 
it does not recurse and it does not use a significant amount of stack 
space. Maybe some number theorists want to have a look? I cannot seem to 
find it anywhere public.


See here:
https://reviews.freebsd.org/D5241

Local test results w/D5241 running statically in userspace:


Data set size log2(N)   qsort w/insert sort qsort w/hpsort  mergesort   
data set
10  1.281.121.34random0
11  2.422.432.83random0
12  5.215.2 6.1 random0

10  1.261.141.44random1
11  2.462.463.05random1
12  5.285.296.56random1

10  10.01   5.1 0.2 bad0
11  39.12   12.11   0.33bad0
12  156.54  28.42   0.58bad0


New algorithm is in the middle column. Times are in seconds.

Bad0 data set:

http://calmerthanyouare.org/2014/06/11/algorithmic-complexity-attacks-and-libc-qsort.html


--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: qsort() documentation

2016-04-19 Thread Warren Block

On Tue, 19 Apr 2016, Aleksander Alekseev wrote:


Why Wikipedia, specifically?  There are a lot of places that describe
quicksort.  How about just

   Note: This implementation of qsort() is designed to avoid the
   worst-case complexity of N**2 that is often seen with standard
   versions.


I would say that this statement is just false. Worst-case complexity is
still N**2. How about something like:

"""
This implementation of qsort() has worst case complexity of N**2.
However measures were taken that make it very unlikely that for some
random input N**2 swaps will be made. It's still possible to generate
such an input on purpose though. See code below for more details.
"""


Okay:

  The quicksort algorithm worst-case is O(N**2), but this implementation
  has been designed to avoid that for most real data.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: qsort() documentation

2016-04-19 Thread Aleksander Alekseev
> Why Wikipedia, specifically?  There are a lot of places that describe 
> quicksort.  How about just
> 
>Note: This implementation of qsort() is designed to avoid the
>worst-case complexity of N**2 that is often seen with standard
>versions.

I would say that this statement is just false. Worst-case complexity is
still N**2. How about something like:

"""
This implementation of qsort() has worst case complexity of N**2.
However measures were taken that make it very unlikely that for some
random input N**2 swaps will be made. It's still possible to generate
such an input on purpose though. See code below for more details.
"""

-- 
Best regards,
Aleksander Alekseev
http://eax.me/
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: qsort() documentation

2016-04-18 Thread Warren Block

On Mon, 18 Apr 2016, Hans Petter Selasky wrote:


Hi,

Are there any objections adding the following as part of documenting our 
kernel's qsort function?


Index: sys/libkern/qsort.c
===
--- sys/libkern/qsort.c (revision 298202)
+++ sys/libkern/qsort.c (working copy)
@@ -45,6 +45,10 @@

/*
 * Qsort routine from Bentley & McIlroy's "Engineering a Sort Function".
+ *
+ * NOTE: This implementation of qsort() was designed to not have the


"was designed to avoid the"


+ * worst case complexity of N**2, as seen with the regular quick sort
+ * functions as described by Wikipedia.
 */


Why Wikipedia, specifically?  There are a lot of places that describe 
quicksort.  How about just


  Note: This implementation of qsort() is designed to avoid the
  worst-case complexity of N**2 that is often seen with standard
  versions.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: qsort() documentation

2016-04-18 Thread Ryan Stone
On Mon, Apr 18, 2016 at 12:13 PM, Hans Petter Selasky 
wrote:

> Did anyone try to generate such a fiendish set of data, and see how
> quadratic the FreeBSD's qsort() becomes?
>

Not me, but it has been done:

http://calmerthanyouare.org/2014/06/11/algorithmic-complexity-attacks-and-libc-qsort.html
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: qsort() documentation

2016-04-18 Thread Hans Petter Selasky

On 04/18/16 16:49, Ed Schouten wrote:

2016-04-18 15:09 GMT+02:00 Hans Petter Selasky :

On 04/18/16 14:16, Aleksander Alekseev wrote:

I suggest also add a short description of how it was achieved
(randomization?).


I think the algorithm is switching to mergesort. I'll look up the paper and
add that correctly before commit.


As a Dutch person, I know the answer to this.

Instead of picking a fixed pivot or choosing one at random, it uses an
approach called linear time median finding to find a pivot that is
'approximately median'. There are a couple of algorithms for this, but
I think Bentley's qsort() uses this:

https://en.wikipedia.org/wiki/Dutch_national_flag_problem



Hi,

Ryan: Yes, there is quadratic behaviour still, but I believe the order 
is limited. For the matter of the topic I added a counter for the swap() 
code in the insertion fallback algorithm, and for a set of 2048 integers 
I never saw the swap() count exceed this number. For a pre-sorted array, 
values around ~2047 and reverse sorted ~2043. For random input far less.


Citing the document "bentley93engineering.pdf", a footnote says:


Of course, quadratic behavior is still possible. One can generate 
fiendish inputs by bugging Quicksort: Con-
sider key values to be unknown initially. In the code for selecting a 
partition element, assign values in increas-
ing order as unknown keys are encountered. In the partitioning code, 
make unknown keys compare high.



Did anyone try to generate such a fiendish set of data, and see how 
quadratic the FreeBSD's qsort() becomes?


Another thread, possibly related:

http://postgresql.nabble.com/Why-do-we-still-perform-a-check-for-pre-sorted-input-within-qsort-variants-td5746526.html

--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: qsort() documentation

2016-04-18 Thread Ed Schouten
2016-04-18 15:09 GMT+02:00 Hans Petter Selasky :
> On 04/18/16 14:16, Aleksander Alekseev wrote:
>> I suggest also add a short description of how it was achieved
>> (randomization?).
>
> I think the algorithm is switching to mergesort. I'll look up the paper and
> add that correctly before commit.

As a Dutch person, I know the answer to this.

Instead of picking a fixed pivot or choosing one at random, it uses an
approach called linear time median finding to find a pivot that is
'approximately median'. There are a couple of algorithms for this, but
I think Bentley's qsort() uses this:

https://en.wikipedia.org/wiki/Dutch_national_flag_problem

-- 
Ed Schouten 
Nuxi, 's-Hertogenbosch, the Netherlands
KvK-nr.: 62051717
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: qsort() documentation

2016-04-18 Thread Ryan Stone
On Mon, Apr 18, 2016 at 9:09 AM, Hans Petter Selasky  wrote

> I think the algorithm is switching to mergesort. I'll look up the paper
> and add that correctly before commit.
>

No, it switches to insertion sort, assuming that it's acting on an already
sorted array.  If that assumption is wrong you still get O(n**2) complexity.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: qsort() documentation

2016-04-18 Thread Hans Petter Selasky

On 04/18/16 14:16, Aleksander Alekseev wrote:

I suggest also add a short description of how it was achieved
(randomization?).


I think the algorithm is switching to mergesort. I'll look up the paper 
and add that correctly before commit.


--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: qsort() documentation

2016-04-18 Thread Aleksander Alekseev
Hello.

I suggest also add a short description of how it was achieved
(randomization?).

On Mon, 18 Apr 2016 13:43:38 +0200
Hans Petter Selasky  wrote:

> Hi,
> 
> Are there any objections adding the following as part of documenting
> our kernel's qsort function?
> 
> Index: sys/libkern/qsort.c
> ===
> --- sys/libkern/qsort.c (revision 298202)
> +++ sys/libkern/qsort.c (working copy)
> @@ -45,6 +45,10 @@
> 
>   /*
>* Qsort routine from Bentley & McIlroy's "Engineering a Sort
> Function".
> + *
> + * NOTE: This implementation of qsort() was designed to not have the
> + * worst case complexity of N**2, as seen with the regular quick sort
> + * functions as described by Wikipedia.
>*/
> 
> 
> --HPS
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to
> "freebsd-current-unsubscr...@freebsd.org"
> 



-- 
Best regards,
Aleksander Alekseev
http://eax.me/
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


qsort() documentation

2016-04-18 Thread Hans Petter Selasky

Hi,

Are there any objections adding the following as part of documenting our 
kernel's qsort function?


Index: sys/libkern/qsort.c
===
--- sys/libkern/qsort.c (revision 298202)
+++ sys/libkern/qsort.c (working copy)
@@ -45,6 +45,10 @@

 /*
  * Qsort routine from Bentley & McIlroy's "Engineering a Sort Function".
+ *
+ * NOTE: This implementation of qsort() was designed to not have the
+ * worst case complexity of N**2, as seen with the regular quick sort
+ * functions as described by Wikipedia.
  */


--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"