Re: On the D Blog: Lomuto's Comeback
On 8/4/20 9:49 AM, Andrei Alexandrescu wrote: On 8/4/20 4:19 AM, Iain Buclaw wrote: On 04/08/2020 03:14, Andrei Alexandrescu wrote: Interesting, thanks! Did a quick benchmark for n in `seq 1 10` ./lomuto.exe ${n}00... [snip] Looks good, so committing patch. :-) Awesome, thanks! That does solve a puzzler I had while benchmarking. I'm thinking the story of discovering and fixing this would be a great follow-up in the blog. It doesn't quite mesh with Mike's current introductory series, but it could be done as an intermezzo a la "Now For Something Completely Different (And Much Lower Level)". Sketch of an intro: Upon reading "Lomuto's Comeback" in the D blog, I noticed the performance were consistently juuust a bit worse for the D version than for the C++ version for the same source code. My own measurements confirmed the same. That bothered me at two levels. First, people unfamiliar with the D language would form the opinion that D cannot reach the efficiency of C++. Second, as the gdc creator and maintainer, I knew for a fact the produced code must be literally identical. Any difference would pin point a bug somewhere in the code generation pipeline. So I set out to find it and fix it. This is the story of that investigation, which will take us through looking through disassembly, finding the culprit, devising a fix, confirming with measurements, and patching the open-source gdc compiler. ... Oh, and there are a few comments to the original blog post I'd be glad to respond to in an appendix.
Re: On the D Blog: Lomuto's Comeback
On 8/4/20 4:19 AM, Iain Buclaw wrote: On 04/08/2020 03:14, Andrei Alexandrescu wrote: Interesting, thanks! Did a quick benchmark for n in `seq 1 10` ./lomuto.exe ${n}00... [snip] Looks good, so committing patch. :-) Awesome, thanks! That does solve a puzzler I had while benchmarking. I'm thinking the story of discovering and fixing this would be a great follow-up in the blog. It doesn't quite mesh with Mike's current introductory series, but it could be done as an intermezzo a la "Now For Something Completely Different (And Much Lower Level)". Sketch of an intro: Upon reading "Lomuto's Comeback" in the D blog, I noticed the performance were consistently juuust a bit worse for the D version than for the C++ version for the same source code. My own measurements confirmed the same. That bothered me at two levels. First, people unfamiliar with the D language would form the opinion that D cannot reach the efficiency of C++. Second, as the gdc creator and maintainer, I knew for a fact the produced code must be literally identical. Any difference would pin point a bug somewhere in the code generation pipeline. So I set out to find it and fix it. This is the story of that investigation, which will take us through looking through disassembly, finding the culprit, devising a fix, confirming with measurements, and patching the open-source gdc compiler. ...
Re: On the D Blog: Lomuto's Comeback
On 04/08/2020 03:14, Andrei Alexandrescu wrote: > Interesting, thanks! > Did a quick benchmark for n in `seq 1 10` ./lomuto.exe ${n}00... gdc-baseline: min_milliseconds=53.2922 median_milliseconds=56.8761 min_milliseconds=111.2512 median_milliseconds=115.5812 min_milliseconds=168.8659 median_milliseconds=174.9421 min_milliseconds=228.9230 median_milliseconds=235.9950 min_milliseconds=291.4758 median_milliseconds=302.2652 min_milliseconds=354.6598 median_milliseconds=360.3230 min_milliseconds=420.6221 median_milliseconds=424.0275 min_milliseconds=490.9294 median_milliseconds=505.0082 min_milliseconds=535.9912 median_milliseconds=556.0680 min_milliseconds=619.8160 median_milliseconds=629.6446 gdc-pr96429: min_milliseconds=44.1008 median_milliseconds=46.2956 min_milliseconds=96.0864 median_milliseconds=99.4665 min_milliseconds=146.2498 median_milliseconds=151.5661 min_milliseconds=201.9479 median_milliseconds=207.0937 min_milliseconds=253.2555 median_milliseconds=265.6178 min_milliseconds=309.5941 median_milliseconds=317.8178 min_milliseconds=364.9312 median_milliseconds=381.9105 min_milliseconds=427.2581 median_milliseconds=437.9506 min_milliseconds=464.2838 median_milliseconds=482.9127 min_milliseconds=537.3167 median_milliseconds=550.9250 g++: min_milliseconds=44.0164 median_milliseconds=46.5057 min_milliseconds=91.2730 median_milliseconds=97.8507 min_milliseconds=142.8459 median_milliseconds=153.4782 min_milliseconds=196.4765 median_milliseconds=202.1877 min_milliseconds=251.5876 median_milliseconds=261.6350 min_milliseconds=299.8530 median_milliseconds=311.0719 min_milliseconds=351.9959 median_milliseconds=370.0437 min_milliseconds=412.4231 median_milliseconds=420.1336 min_milliseconds=462.4810 median_milliseconds=474.2444 min_milliseconds=539.3359 median_milliseconds=541.5321 Looks good, so committing patch. :-)
Re: On the D Blog: Lomuto's Comeback
On 03/08/2020 13:08, Iain Buclaw via Digitalmars-d-announce wrote: > > > On 15/05/2020 12:28, Joseph Rushton Wakeling via Digitalmars-d-announce wrote: >> On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote: >>> After reading a paper that grabbed his curiosity and wouldn't let go, >>> Andrei set out to determine if Lomuto partitioning should still be >>> considered inferior to Hoare for quicksort on modern hardware. This blog >>> post details his results. >>> >>> Blog: >>> https://dlang.org/blog/2020/05/14/lomutos-comeback/ >> >> Nice stuff! >> >> One curious question -- unless I've misread things horribly, it looks like >> the D benchmarks for Lomuto branch-free are consistently slower than for >> C++. Any idea why that is? I would expect gcc/gdc and clang/ldc to produce >> effectively identical results for code like this. > > Sorry for the belated response, as far as I can see, gdc and g++ only differ > on one line. > > auto delta = smaller & (read - first); > > This is lowered as: > > delta = smaller & (read - first) / 8; > > However, g++ uses an exact divide operator (semantically that ignores > rounding), whereas gdc uses a truncating divide operator (semantically rounds > the quotient towards zero). > > I'm willing to bet a beer on tweaking pointer subtraction will get gdc in > lockstep with g++. > I doubt Andrei will re-run the benchmarks now, but here's the PR (problem reference) with patch attached. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96429
Re: On the D Blog: Lomuto's Comeback
On 15/05/2020 12:28, Joseph Rushton Wakeling via Digitalmars-d-announce wrote: > On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote: >> After reading a paper that grabbed his curiosity and wouldn't let go, Andrei >> set out to determine if Lomuto partitioning should still be considered >> inferior to Hoare for quicksort on modern hardware. This blog post details >> his results. >> >> Blog: >> https://dlang.org/blog/2020/05/14/lomutos-comeback/ > > Nice stuff! > > One curious question -- unless I've misread things horribly, it looks like > the D benchmarks for Lomuto branch-free are consistently slower than for C++. > Any idea why that is? I would expect gcc/gdc and clang/ldc to produce > effectively identical results for code like this. Sorry for the belated response, as far as I can see, gdc and g++ only differ on one line. auto delta = smaller & (read - first); This is lowered as: delta = smaller & (read - first) / 8; However, g++ uses an exact divide operator (semantically that ignores rounding), whereas gdc uses a truncating divide operator (semantically rounds the quotient towards zero). I'm willing to bet a beer on tweaking pointer subtraction will get gdc in lockstep with g++.
Re: On the D Blog: Lomuto's Comeback
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote: After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/ Reddit: https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ HN: https://news.ycombinator.com/item?id=23179160 A follow up article on this: https://news.ycombinator.com/item?id=23363165 https://blog.reverberate.org/2020/05/29/hoares-rebuttal-bubble-sorts-comeback.html
Re: On the D Blog: Lomuto's Comeback
On 5/14/20 9:26 AM, Mike Parker wrote: After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/ Reddit: https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ HN: https://news.ycombinator.com/item?id=23179160 Looks like the blog post is enjoying a second wind after being posted by soneone else on hackernews. It's in top 10 right now.
Re: On the D Blog: Lomuto's Comeback
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote: After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/ Reddit: https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ HN: https://news.ycombinator.com/item?id=23179160 Got posted again to Hacker News earlier today. Currently at position 5.
Re: On the D Blog: Lomuto's Comeback
On Thursday, 14 May 2020 at 14:11:57 UTC, SashaGreat wrote: If possible could you please next time share link with "old" instead of "www"? Like: https://old.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ There is a Chrome extension that automatically redirects to the old version of Reddit
Re: On the D Blog: Lomuto's Comeback
On 5/14/20 11:57 AM, jmh530 wrote: On Thursday, 14 May 2020 at 13:40:24 UTC, Andrei Alexandrescu wrote: [snip] Really interesting. Thanks for sharing. I have recently been spending some spare time learning more about D's topN and pivotPartition implementation, which led me to your paper on fast deterministic selection. Would you consider changing the pivotPartition implementation based on this? Yes, and I encourage you to look into putting together a PR. Would the insights gleamed from this paper mean that a branchless version of topN could be faster? Yes. topN also uses partitioning.
Re: On the D Blog: Lomuto's Comeback
On 5/14/20 9:26 AM, Mike Parker wrote: After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/ Reddit: https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ HN: https://news.ycombinator.com/item?id=23179160 Fantastic article! -Steve
Re: On the D Blog: Lomuto's Comeback
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote: After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/ Great post, and nice to have another example for how bad branches can really be for performance! One note: The clang/ldc compiler explorer links for lomuto_partition_branchfree are wrong.
Re: On the D Blog: Lomuto's Comeback
On Friday, 15 May 2020 at 10:28:41 UTC, Joseph Rushton Wakeling wrote: One curious question -- unless I've misread things horribly, it looks like the D benchmarks for Lomuto branch-free are consistently slower than for C++. Any idea why that is? I would expect gcc/gdc and clang/ldc to produce effectively identical results for code like this. Wrt. the LDC results, LDC v1.17 was shipped with LLVM 8.0.1, while the used clang is v10 based on LLVM 10, so that might account for some slight diffs.
Re: On the D Blog: Lomuto's Comeback
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote: After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/ Nice stuff! One curious question -- unless I've misread things horribly, it looks like the D benchmarks for Lomuto branch-free are consistently slower than for C++. Any idea why that is? I would expect gcc/gdc and clang/ldc to produce effectively identical results for code like this.
Re: On the D Blog: Lomuto's Comeback
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote: After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Fantastic work and great result. Having privately done a very heavy critique of the narrow niche the article had chosen to explore I still recognize and love the results. — Dmitry Olshansky
Re: On the D Blog: Lomuto's Comeback
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote: After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/ Reddit: https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ HN: https://news.ycombinator.com/item?id=23179160 Wow, very interesting article. Thanks for sharing.
Re: On the D Blog: Lomuto's Comeback
On Thursday, 14 May 2020 at 13:40:24 UTC, Andrei Alexandrescu wrote: [snip] Really interesting. Thanks for sharing. I have recently been spending some spare time learning more about D's topN and pivotPartition implementation, which led me to your paper on fast deterministic selection. Would you consider changing the pivotPartition implementation based on this? Would the insights gleamed from this paper mean that a branchless version of topN could be faster?
Re: On the D Blog: Lomuto's Comeback
On Thursday, 14 May 2020 at 13:40:24 UTC, Andrei Alexandrescu wrote: On 5/14/20 9:26 AM, Mike Parker wrote: The right way to share something on hackernews is to send people to https://news.ycombinator.com/newest and mention the time of sharing. Okay everyone, please use this link or search for "Lomuto's Comeback" in the search field. I've been hearing conflicting accounts of this for a while, with more people telling me it doesn't happen anymore. However, it seems posts were never flagged as spam for this. Instead, any upvotes from people coming through direct links *do not count*. Coupled with the fact that the FAQ still says posts are penalized for "asking for votes", I'm no longer going to share direct links to HN articles. Found multiple sources about it, but this 2015 post lays it all out and I assume it's still mostly relevant: https://wiredcraft.com/blog/how-to-post-on-hacker-news/ https://news.ycombinator.com/newsfaq.html
Re: On the D Blog: Lomuto's Comeback
On Thursday, 14 May 2020 at 13:26:23 UTC, Mike Parker wrote: ... Reddit: https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ ... If possible could you please next time share link with "old" instead of "www"? Like: https://old.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ Thanks, SG.
Re: On the D Blog: Lomuto's Comeback
On 5/14/20 9:26 AM, Mike Parker wrote: After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/ Reddit: https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ HN: https://news.ycombinator.com/item?id=23179160 Thanks, Mike. HN has possibly categorized it as spam already. One thing they do is they detect (by using the "Referrer" header) whether the post has been shared via a direct link. They do so to prevent manipulation. The right way to share something on hackernews is to send people to https://news.ycombinator.com/newest and mention the time of sharing.
On the D Blog: Lomuto's Comeback
After reading a paper that grabbed his curiosity and wouldn't let go, Andrei set out to determine if Lomuto partitioning should still be considered inferior to Hoare for quicksort on modern hardware. This blog post details his results. Blog: https://dlang.org/blog/2020/05/14/lomutos-comeback/ Reddit: https://www.reddit.com/r/programming/comments/gjm6yp/lomutos_comeback_quicksort_partitioning/ HN: https://news.ycombinator.com/item?id=23179160