> On 25 Apr 2024, at 19:41, David Lloyd <david.ll...@redhat.com> wrote:
> 
> The JDK contains a few collection- and array-oriented implementations of 
> binary search. For the most part, they all work the same way: search the 
> target "thing" by index using the obvious bisection strategy, returning 
> either /index/ or /-index-1/ depending on whether it was found at the end of 
> the search.
> 
> However, if you're doing a binary search on a thing that is not a list or an 
> array, you have two choices: try to make the thing you're searching on 
> implement List (often infeasible) or write your own binary search.
> 
> I'm really tired of writing my own binary search. I've probably done it 50 
> times by now, each one using a slightly different access and/or compare 
> strategy, and every time is a new chance to typo something or get something 
> backwards, leading to irritating debugging sessions and higher dentist bills.

Can we safely say that it sets your teeth on edge?

> It got me to thinking that it wouldn't be too hard to make a "general" binary 
> search which can search on anything, so that's what I did. I was thinking 
> that it might be interesting to try adding this into the JDK somehow.
> 
> This implementation is more or less what I now copy & paste to different 
> projects at the moment:
> 
>     public static <C, T> int binarySearch(C collection, int start, int end, T 
> key, Comparator<? super T> cmp, IntGetter<C, T> getter) {
>         int low = start;
>         int high = end - 1;
>         while (low <= high) {
>             int mid = low + high >>> 1;
>             int res = cmp.compare(getter.get(collection, mid), key);
>             if (res < 0) {
>                 low = mid + 1;
>             } else if (res > 0) {
>                 high = mid - 1;
>             } else {
>                 return mid;
>             }
>         }
>         return -low - 1;
>     }
>     // (Plus variations for `Comparable` keys and long indices)
> 
> A key problem with this approach is that in the JDK, there is no 
> `ObjIntFunction<T, R>` or `ObjLongFunction<T, R>` which would be necessary to 
> implement the "getter" portion of the algorithm (despite the existence of the 
> analogous `ObjIntConsumer<T>` and `ObjLongConsumer<T>`). So, I also have to 
> copy/paste a `IntGetter<T, R>`/`LongGetter<T, R>` as well, which is annoying.
> 
> A slightly less-good approach is for the general `binarySearch` method to 
> accept a `IntFunction<T>`/`LongFunction<T>`, and drop the `C collection` 
> parameter, like this:
> 
>     public static <T> int binarySearch(int start, int end, T key, 
> Comparator<? super T> cmp, IntFunction<T> getter) { ... }
> 
> In this case, we can use the function types that the JDK already provides, 
> but we would very likely have to also create a capturing lambda (e.g. 
> `myList::get` instead of `List::get`). Maybe this isn't that bad of a 
> compromise.
> 
> It would be possible to replace the existing `binarySearch` implementations 
> with delegating calls to a generalized implementation. For `Collections`, the 
> indexed version uses `List::get` and the iterator version uses a helper 
> lambda to move the iterator and get the result. For arrays, a lambda would be 
> provided which gets the corresponding array element. If the non-capturing 
> variant is used, then (on paper at least) this version should perform 
> similarly to the existing implementations, with less code needed overall. 
> However, if a capturing lambda is required (due to the aforementioned lack of 
> `ObjXFunction`), then this could be slightly worse-performing than the 
> existing implementation due to the construction (and maybe dispatch) overhead 
> of the lambda. Some real-world benchmarks would have to be written with 
> various-sized data sets.
> 
> It would also be possible to produce primitive variations which operate on 
> int, float, long, and double values, using existing functions if capturing is 
> deemed "OK". It is also possible to produce a variation which uses a `long` 
> for the index, for huge data sets (for example, very large mapped files using 
> `MemorySegment`).
> 
> Also unclear is: where would it live? `Collections`? Somewhere else?
> 
> Any thoughts/opinions would be appreciated (even if they are along the lines 
> of "it's not worth the effort"). Particularly, any insight would be 
> appreciated as to whether or not this kind of hypothetical enhancement would 
> warrant a JEP (I wouldn't think so, but I'm no expert at such assessments).
> 
> --
> - DML • he/him

Have a look at this recently filed issue: 
https://bugs.openjdk.org/browse/JDK-8326330

-Pavel


Reply via email to