On 03/24/2018 01:55 AM, Chris Katko wrote:
On Saturday, 24 March 2018 at 03:04:41 UTC, Andrei Alexandrescu wrote:
On 3/23/18 7:29 PM, H. S. Teoh wrote:
Well, looking at the implementation of std.getopt turned up the
disturbing fact that the program's argument list is actually scanned
*multiple times*, one for each possible option(!).  Besides the bogonity
that whether or not searchPaths will be set prior to finding -l depends
on the order of arguments passed to getopt(), this also represents an
O(n*m) complexity in scanning program arguments, where n = number of
arguments and m = number of possible options.

And this is not to mention the fact that getoptImpl is *recursive
template*.  Why, oh why?

Am I the only one who thinks the current implementation of getopt() is
really stupid??  Can somebody please talk some sense into me, or point
out something really obvious that I'm missing?

Affirmative. The implementation is quadratic (including a removal of the option from the string). This is intentional, i.e. understood and acknowledged while I was working on it. Given that the function is only called once per run and with a number of arguments at most in the dozens, by the time its complexity becomes an issue the function is long beyond its charter.

This isn't the only instance of quadratic algorithms in Phobos. Quicksort uses an insertion sort - a quadratic algorithm - for 25 elements or fewer. That algorithm may do 600 comparisons in the worst case, and it's potentially that many for each group of 25 elements in a large array.

Spending time on improving the speed of getopt is unrecommended. Such work would add no value.


Andrei

Is there a possibility of improving this function?

Most likely it can be improved in terms of features.

 - While quadratic, for low N, quadratic isn't a big deal. So at what point does quadratic for this function become "a problem"?

At a point where a realistic benchmarks shows a need. Without a motivating measurement, making getopt faster would be a waste of time.

I mentioned another function: sort. For that, YES, the are ways of improving it. In fact, right after posting my message, I couldn't sleep thinking of a number of ways to improve the short sort part. We have precise benchmarks measuring the number of comparisons and swaps performed by our implementation of sort. Improving its performance lifts a lot of boats - many high-level algorithms use sort as an essential building block. There's a world of difference in impact of the speed of sort vs. speed of getopt.

Here are a few ideas for improving the small array sort part (the last mile of sort):

* Currently we switch to short sort when the number of elements to sort is smaller than max(32, 1024 / Elem.sizeof). Probably a better choice can be found through experimentation.

* Insertion sort does linear search in the already-sorted portion. Probably galloping search would fare better.

* Insertion sort starts from the end and grows the sorted portion down. Starting from the middle of the array and growing left and right simultaneously would slash the number of comparisons and swaps in half.

* There could be other algorithms better for short arrays, such as specialized versions of heapsort or smoothsort.

  - If it is a problem, what's stopping someone from improving it?

Hopefully very little.

Last question though, is there any kind of list of features, and minor features and fixes that can or need to be done? Perhaps it already exists, but it seems like it'd be great to have a wiki of contribution sites (like this function) that someone could just browse and go "Hey, I know how to do X, maybe I'll take a crack at it." That way, devs who don't have time to improve something "low on the list" could still outsource it in a clear list instead of people who just happen to see it on the forum at the right place right time.

Seb gave a great answer - thanks!


Andrei

Reply via email to