Re: On the D Blog: Lomuto's Comeback

2020-08-04 Thread Andrei Alexandrescu via Digitalmars-d-announce

On 8/4/20 9:49 AM, Andrei Alexandrescu wrote:

On 8/4/20 4:19 AM, Iain Buclaw wrote:

On 04/08/2020 03:14, Andrei Alexandrescu wrote:

Interesting, thanks!



Did a quick benchmark for n in `seq 1 10` ./lomuto.exe ${n}00...

[snip]

Looks good, so committing patch. :-)


Awesome, thanks! That does solve a puzzler I had while benchmarking.

I'm thinking the story of discovering and fixing this would be a great 
follow-up in the blog. It doesn't quite mesh with Mike's current 
introductory series, but it could be done as an intermezzo a la "Now For 
Something Completely Different (And Much Lower Level)".


Sketch of an intro:

Upon reading "Lomuto's Comeback" in the D blog, I noticed the 
performance were consistently juuust a bit worse for the D version than 
for the C++ version for the same source code. My own measurements 
confirmed the same. That bothered me at two levels. First, people 
unfamiliar with the D language would form the opinion that D cannot 
reach the efficiency of C++. Second, as the gdc creator and maintainer, 
I knew for a fact the produced code must be literally identical. Any 
difference would pin point a bug somewhere in the code generation 
pipeline. So I set out to find it and fix it. This is the story of that 
investigation, which will take us through looking through disassembly, 
finding the culprit, devising a fix, confirming with measurements, and 
patching the open-source gdc compiler.


...


Oh, and there are a few comments to the original blog post I'd be glad 
to respond to in an appendix.


Re: On the D Blog: Lomuto's Comeback

2020-08-04 Thread Andrei Alexandrescu via Digitalmars-d-announce

On 8/4/20 4:19 AM, Iain Buclaw wrote:

On 04/08/2020 03:14, Andrei Alexandrescu wrote:

Interesting, thanks!



Did a quick benchmark for n in `seq 1 10` ./lomuto.exe ${n}00...

[snip]

Looks good, so committing patch. :-)


Awesome, thanks! That does solve a puzzler I had while benchmarking.

I'm thinking the story of discovering and fixing this would be a great 
follow-up in the blog. It doesn't quite mesh with Mike's current 
introductory series, but it could be done as an intermezzo a la "Now For 
Something Completely Different (And Much Lower Level)".


Sketch of an intro:

Upon reading "Lomuto's Comeback" in the D blog, I noticed the 
performance were consistently juuust a bit worse for the D version than 
for the C++ version for the same source code. My own measurements 
confirmed the same. That bothered me at two levels. First, people 
unfamiliar with the D language would form the opinion that D cannot 
reach the efficiency of C++. Second, as the gdc creator and maintainer, 
I knew for a fact the produced code must be literally identical. Any 
difference would pin point a bug somewhere in the code generation 
pipeline. So I set out to find it and fix it. This is the story of that 
investigation, which will take us through looking through disassembly, 
finding the culprit, devising a fix, confirming with measurements, and 
patching the open-source gdc compiler.


...


Re: On the D Blog: Lomuto's Comeback

2020-08-04 Thread Iain Buclaw via Digitalmars-d-announce
On 04/08/2020 03:14, Andrei Alexandrescu wrote:
> Interesting, thanks!
> 

Did a quick benchmark for n in `seq 1 10` ./lomuto.exe ${n}00...

gdc-baseline:
min_milliseconds=53.2922
median_milliseconds=56.8761
min_milliseconds=111.2512
median_milliseconds=115.5812
min_milliseconds=168.8659
median_milliseconds=174.9421
min_milliseconds=228.9230
median_milliseconds=235.9950
min_milliseconds=291.4758
median_milliseconds=302.2652
min_milliseconds=354.6598
median_milliseconds=360.3230
min_milliseconds=420.6221
median_milliseconds=424.0275
min_milliseconds=490.9294
median_milliseconds=505.0082
min_milliseconds=535.9912
median_milliseconds=556.0680
min_milliseconds=619.8160
median_milliseconds=629.6446

gdc-pr96429:
min_milliseconds=44.1008
median_milliseconds=46.2956
min_milliseconds=96.0864
median_milliseconds=99.4665
min_milliseconds=146.2498
median_milliseconds=151.5661
min_milliseconds=201.9479
median_milliseconds=207.0937
min_milliseconds=253.2555
median_milliseconds=265.6178
min_milliseconds=309.5941
median_milliseconds=317.8178
min_milliseconds=364.9312
median_milliseconds=381.9105
min_milliseconds=427.2581
median_milliseconds=437.9506
min_milliseconds=464.2838
median_milliseconds=482.9127
min_milliseconds=537.3167
median_milliseconds=550.9250

g++:
min_milliseconds=44.0164
median_milliseconds=46.5057
min_milliseconds=91.2730
median_milliseconds=97.8507
min_milliseconds=142.8459
median_milliseconds=153.4782
min_milliseconds=196.4765
median_milliseconds=202.1877
min_milliseconds=251.5876
median_milliseconds=261.6350
min_milliseconds=299.8530
median_milliseconds=311.0719
min_milliseconds=351.9959
median_milliseconds=370.0437
min_milliseconds=412.4231
median_milliseconds=420.1336
min_milliseconds=462.4810
median_milliseconds=474.2444
min_milliseconds=539.3359
median_milliseconds=541.5321

Looks good, so committing patch. :-)


Re: On the D Blog: Lomuto's Comeback

2020-08-03 Thread Iain Buclaw via Digitalmars-d-announce
On 03/08/2020 13:08, Iain Buclaw via Digitalmars-d-announce wrote:
> 
> 
> On 15/05/2020 12:28, Joseph Rushton Wakeling via Digitalmars-d-announce wrote:
>> On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:
>>> After reading a paper that grabbed his curiosity and wouldn't let go, 
>>> Andrei set out to determine if Lomuto partitioning should still be 
>>> considered inferior to Hoare for quicksort on modern hardware. This blog 
>>> post details his results.
>>>
>>> Blog:
>>> https://dlang.org/blog/2020/05/14/lomutos-comeback/
>>
>> Nice stuff!
>>
>> One curious question -- unless I've misread things horribly, it looks like 
>> the D benchmarks for Lomuto branch-free are consistently slower than for 
>> C++.  Any idea why that is?  I would expect gcc/gdc and clang/ldc to produce 
>> effectively identical results for code like this.
> 
> Sorry for the belated response, as far as I can see, gdc and g++ only differ 
> on one line.
> 
> auto delta = smaller & (read - first);
> 
> This is lowered as:
> 
> delta = smaller & (read - first) / 8;
> 
> However, g++ uses an exact divide operator (semantically that ignores 
> rounding), whereas gdc uses a truncating divide operator (semantically rounds 
> the quotient towards zero).
> 
> I'm willing to bet a beer on tweaking pointer subtraction will get gdc in 
> lockstep with g++.
> 

I doubt Andrei will re-run the benchmarks now, but here's the PR (problem 
reference) with patch attached.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96429



Re: On the D Blog: Lomuto's Comeback

2020-08-03 Thread Iain Buclaw via Digitalmars-d-announce



On 15/05/2020 12:28, Joseph Rushton Wakeling via Digitalmars-d-announce wrote:
> On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:
>> After reading a paper that grabbed his curiosity and wouldn't let go, Andrei 
>> set out to determine if Lomuto partitioning should still be considered 
>> inferior to Hoare for quicksort on modern hardware. This blog post details 
>> his results.
>>
>> Blog:
>> https://dlang.org/blog/2020/05/14/lomutos-comeback/
> 
> Nice stuff!
> 
> One curious question -- unless I've misread things horribly, it looks like 
> the D benchmarks for Lomuto branch-free are consistently slower than for C++. 
>  Any idea why that is?  I would expect gcc/gdc and clang/ldc to produce 
> effectively identical results for code like this.

Sorry for the belated response, as far as I can see, gdc and g++ only differ on 
one line.

auto delta = smaller & (read - first);

This is lowered as:

delta = smaller & (read - first) / 8;

However, g++ uses an exact divide operator (semantically that ignores 
rounding), whereas gdc uses a truncating divide operator (semantically rounds 
the quotient towards zero).

I'm willing to bet a beer on tweaking pointer subtraction will get gdc in 
lockstep with g++.


Re: On the D Blog: Lomuto's Comeback

2020-05-31 Thread Arjan via Digitalmars-d-announce

On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:
After reading a paper that grabbed his curiosity and wouldn't 
let go, Andrei set out to determine if Lomuto partitioning 
should still be considered inferior to Hoare for quicksort on 
modern hardware. This blog post details his results.


Blog:
https://dlang.org/blog/2020/05/14/lomutos-comeback/

Reddit:
https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/

HN:
https://news.ycombinator.com/item?id=23179160


A follow up article on this:

https://news.ycombinator.com/item?id=23363165
https://blog.reverberate.org/2020/05/29/hoares-rebuttal-bubble-sorts-comeback.html


Re: On the D Blog: Lomuto's Comeback

2020-05-17 Thread Andrei Alexandrescu via Digitalmars-d-announce

On 5/14/20 9:26 AM, Mike Parker wrote:
After reading a paper that grabbed his curiosity and wouldn't let go, 
Andrei set out to determine if Lomuto partitioning should still be 
considered inferior to Hoare for quicksort on modern hardware. This blog 
post details his results.


Blog:
https://dlang.org/blog/2020/05/14/lomutos-comeback/

Reddit:
https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ 



HN:
https://news.ycombinator.com/item?id=23179160


Looks like the blog post is enjoying a second wind after being posted by 
soneone else on hackernews. It's in top 10 right now.


Re: On the D Blog: Lomuto's Comeback

2020-05-17 Thread Jon Degenhardt via Digitalmars-d-announce

On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:
After reading a paper that grabbed his curiosity and wouldn't 
let go, Andrei set out to determine if Lomuto partitioning 
should still be considered inferior to Hoare for quicksort on 
modern hardware. This blog post details his results.


Blog:
https://dlang.org/blog/2020/05/14/lomutos-comeback/

Reddit:
https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/

HN:
https://news.ycombinator.com/item?id=23179160


Got posted again to Hacker News earlier today. Currently at 
position 5.


Re: On the D Blog: Lomuto's Comeback

2020-05-17 Thread JN via Digitalmars-d-announce

On Thursday, 14 May 2020 at 14:11:57 UTC, SashaGreat wrote:


If possible could you please next time share link with "old" 
instead of "www"? Like:


https://old.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/



There is a Chrome extension that automatically redirects to the 
old version of Reddit




Re: On the D Blog: Lomuto's Comeback

2020-05-17 Thread Andrei Alexandrescu via Digitalmars-d-announce

On 5/14/20 11:57 AM, jmh530 wrote:

On Thursday, 14 May 2020 at 13:40:24 UTC, Andrei Alexandrescu wrote:

[snip]


Really interesting. Thanks for sharing.

I have recently been spending some spare time learning more about D's 
topN and pivotPartition implementation, which led me to your paper on 
fast deterministic selection.


Would you consider changing the pivotPartition implementation based on 
this?


Yes, and I encourage you to look into putting together a PR.

Would the insights gleamed from this paper mean that a branchless 
version of topN could be faster?


Yes. topN also uses partitioning.


Re: On the D Blog: Lomuto's Comeback

2020-05-15 Thread Steven Schveighoffer via Digitalmars-d-announce

On 5/14/20 9:26 AM, Mike Parker wrote:
After reading a paper that grabbed his curiosity and wouldn't let go, 
Andrei set out to determine if Lomuto partitioning should still be 
considered inferior to Hoare for quicksort on modern hardware. This blog 
post details his results.


Blog:
https://dlang.org/blog/2020/05/14/lomutos-comeback/

Reddit:
https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ 



HN:
https://news.ycombinator.com/item?id=23179160


Fantastic article!

-Steve


Re: On the D Blog: Lomuto's Comeback

2020-05-15 Thread Les De Ridder via Digitalmars-d-announce

On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:
After reading a paper that grabbed his curiosity and wouldn't 
let go, Andrei set out to determine if Lomuto partitioning 
should still be considered inferior to Hoare for quicksort on 
modern hardware. This blog post details his results.


Blog:
https://dlang.org/blog/2020/05/14/lomutos-comeback/


Great post, and nice to have another example for how bad branches 
can

really be for performance!

One note: The clang/ldc compiler explorer links for
lomuto_partition_branchfree
are wrong.



Re: On the D Blog: Lomuto's Comeback

2020-05-15 Thread kinke via Digitalmars-d-announce
On Friday, 15 May 2020 at 10:28:41 UTC, Joseph Rushton Wakeling 
wrote:
One curious question -- unless I've misread things horribly, it 
looks like the D benchmarks for Lomuto branch-free are 
consistently slower than for C++.  Any idea why that is?  I 
would expect gcc/gdc and clang/ldc to produce effectively 
identical results for code like this.


Wrt. the LDC results, LDC v1.17 was shipped with LLVM 8.0.1, 
while the used clang is v10 based on LLVM 10, so that might 
account for some slight diffs.


Re: On the D Blog: Lomuto's Comeback

2020-05-15 Thread Joseph Rushton Wakeling via Digitalmars-d-announce

On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:
After reading a paper that grabbed his curiosity and wouldn't 
let go, Andrei set out to determine if Lomuto partitioning 
should still be considered inferior to Hoare for quicksort on 
modern hardware. This blog post details his results.


Blog:
https://dlang.org/blog/2020/05/14/lomutos-comeback/


Nice stuff!

One curious question -- unless I've misread things horribly, it 
looks like the D benchmarks for Lomuto branch-free are 
consistently slower than for C++.  Any idea why that is?  I would 
expect gcc/gdc and clang/ldc to produce effectively identical 
results for code like this.


Re: On the D Blog: Lomuto's Comeback

2020-05-15 Thread Dmitry Olshansky via Digitalmars-d-announce

On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:
After reading a paper that grabbed his curiosity and wouldn't 
let go, Andrei set out to determine if Lomuto partitioning 
should still be considered inferior to Hoare for quicksort on 
modern hardware. This blog post details his results.


Fantastic work and great result. Having privately done a very 
heavy critique of the narrow niche the article had chosen to 
explore I still recognize and love the results.


—
Dmitry Olshansky



Re: On the D Blog: Lomuto's Comeback

2020-05-15 Thread Francesco Mecca via Digitalmars-d-announce

On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:
After reading a paper that grabbed his curiosity and wouldn't 
let go, Andrei set out to determine if Lomuto partitioning 
should still be considered inferior to Hoare for quicksort on 
modern hardware. This blog post details his results.


Blog:
https://dlang.org/blog/2020/05/14/lomutos-comeback/

Reddit:
https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/

HN:
https://news.ycombinator.com/item?id=23179160


Wow, very interesting article.
Thanks for sharing.


Re: On the D Blog: Lomuto's Comeback

2020-05-14 Thread jmh530 via Digitalmars-d-announce
On Thursday, 14 May 2020 at 13:40:24 UTC, Andrei Alexandrescu 
wrote:

[snip]


Really interesting. Thanks for sharing.

I have recently been spending some spare time learning more about 
D's topN and pivotPartition implementation, which led me to your 
paper on fast deterministic selection.


Would you consider changing the pivotPartition implementation 
based on this?


Would the insights gleamed from this paper mean that a branchless 
version of topN could be faster?


Re: On the D Blog: Lomuto's Comeback

2020-05-14 Thread Mike Parker via Digitalmars-d-announce
On Thursday, 14 May 2020 at 13:40:24 UTC, Andrei Alexandrescu 
wrote:

On 5/14/20 9:26 AM, Mike Parker wrote:


The right way to share something on hackernews is to send 
people to https://news.ycombinator.com/newest and mention the 
time of sharing.


Okay everyone, please use this link or search for "Lomuto's 
Comeback" in the search field.


I've been hearing conflicting accounts of this for a while, with 
more people telling me it doesn't happen anymore. However, it 
seems posts were never flagged as spam for this. Instead, any 
upvotes from people coming through direct links *do not count*. 
Coupled with the fact that the FAQ still says posts are penalized 
for "asking for votes", I'm no longer going to share direct links 
to HN articles.


Found multiple sources about it, but this 2015 post lays it all 
out and I assume it's still mostly relevant:

https://wiredcraft.com/blog/how-to-post-on-hacker-news/

https://news.ycombinator.com/newsfaq.html


Re: On the D Blog: Lomuto's Comeback

2020-05-14 Thread SashaGreat via Digitalmars-d-announce

On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote:

...
Reddit:
https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/
...


If possible could you please next time share link with "old" 
instead of "www"? Like:


https://old.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/

Thanks,

SG.


Re: On the D Blog: Lomuto's Comeback

2020-05-14 Thread Andrei Alexandrescu via Digitalmars-d-announce

On 5/14/20 9:26 AM, Mike Parker wrote:
After reading a paper that grabbed his curiosity and wouldn't let go, 
Andrei set out to determine if Lomuto partitioning should still be 
considered inferior to Hoare for quicksort on modern hardware. This blog 
post details his results.


Blog:
https://dlang.org/blog/2020/05/14/lomutos-comeback/

Reddit:
https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ 



HN:
https://news.ycombinator.com/item?id=23179160


Thanks, Mike. HN has possibly categorized it as spam already. One thing 
they do is they detect (by using the "Referrer" header) whether the post 
has been shared via a direct link. They do so to prevent manipulation.


The right way to share something on hackernews is to send people to 
https://news.ycombinator.com/newest and mention the time of sharing.


On the D Blog: Lomuto's Comeback

2020-05-14 Thread Mike Parker via Digitalmars-d-announce
After reading a paper that grabbed his curiosity and wouldn't let 
go, Andrei set out to determine if Lomuto partitioning should 
still be considered inferior to Hoare for quicksort on modern 
hardware. This blog post details his results.


Blog:
https://dlang.org/blog/2020/05/14/lomutos-comeback/

Reddit:
https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/

HN:
https://news.ycombinator.com/item?id=23179160