Hello! If you are doing it fast, why not doing it really fast? If you deparenthesize and regroup terms, you'll got h(e, n) = p ^ n + e * f(n) Where h(e, n) is the hashCode of n elements with hashCode of single element = e; p = 31 and f(n) = Sum(p^k, k = 0..n-1)
Using simple algebraic rules, you'll get: p ^ (2n) = (p^n)^2 f(2n) = f(n) * (p^n + 1) p ^ (n+1) = (p^n)*p f(n+1) = f(n) * p + 1 Thus the algorithm may look as follows: public int hashCode() { int pow = 1; // -> 31^n int sum = 0; // -> Sum(31^k, k = 0..n-1) for (int i = Integer.numberOfLeadingZeros(n); i < Integer.SIZE; i++) { sum = sum * (pow + 1); pow *= pow; if (((n << i) & 0x8000_0000) != 0) { pow *= 31; sum = sum * 31 + 1; } } return pow + sum * (element == null ? 0 : element.hashCode()); } It seems reasonable to peel off the n = 0 case, which helps to reduce number of multiplications for other cases (also for n = 0 we don't need to calculate element hashCode at all): public int hashCode() { if (n == 0) return 1; int pow = 31; // -> 31^n int sum = 1; // -> Sum(31^k, k = 0..n-1) for (int i = Integer.numberOfLeadingZeros(n) + 1; i < Integer.SIZE; i++) { sum = sum * (pow + 1); pow *= pow; if (((n << i) & 0x8000_0000) != 0) { pow *= 31; sum = sum * 31 + 1; } } return pow + sum * (element == null ? 0 : element.hashCode()); } Assuming that element hashCode is simple (I used String "foo" as an element), I got the following results for different collection sizes: Benchmark (size) Score Error Units hashCodeFast 0 2,299 ± 0,017 ns/op hashCodeFast 1 2,731 ± 0,021 ns/op hashCodeFast 2 4,073 ± 0,077 ns/op hashCodeFast 3 4,315 ± 0,032 ns/op hashCodeFast 5 5,470 ± 0,074 ns/op hashCodeFast 10 6,904 ± 0,060 ns/op hashCodeFast 30 9,102 ± 0,173 ns/op hashCodeFast 100 10,093 ± 0,069 ns/op hashCodeFast 1000 14,129 ± 0,074 ns/op hashCodeFast 10000 17,028 ± 0,249 ns/op hashCodeFast 100000 20,795 ± 0,194 ns/op hashCodeFast 1000000 23,622 ± 0,264 ns/op Compared to Zheka's implementation: Benchmark (size) Score Error Units hashCodeZheka 0 2,584 ± 0,024 ns/op hashCodeZheka 1 2,868 ± 0,022 ns/op hashCodeZheka 2 3,730 ± 0,030 ns/op hashCodeZheka 3 4,323 ± 0,027 ns/op hashCodeZheka 5 5,285 ± 0,037 ns/op hashCodeZheka 10 8,254 ± 0,057 ns/op hashCodeZheka 30 24,793 ± 0,218 ns/op hashCodeZheka 100 89,017 ± 0,764 ns/op hashCodeZheka 1000 923,792 ± 28,194 ns/op hashCodeZheka 10000 9157,411 ± 98,902 ns/op hashCodeZheka 100000 91705,599 ± 689,299 ns/op hashCodeZheka 1000000 919723,545 ± 13092,935 ns/op So results are quite similar for one-digit counts, but we start winning from n = 10 and after that logarithmic algorithm really rocks. I can file an issue and create a webrev, but I still need a sponsor and review for such change. Martin, can you help with this? With best regards, Tagir Valeev. On Tue, Nov 27, 2018 at 5:49 PM Martin Buchholz <marti...@google.com> wrote: > > I agree! > > (but don't have time ...) > > On Sun, Nov 25, 2018 at 9:01 PM, Zheka Kozlov <orionllm...@gmail.com> wrote: > > > Currently, CopiesList.hashCode() is inherited from AbstractList which: > > > > - calls hashCode() for each element, > > - creates a new Iterator every time. > > > > However, for Collections.nCopies(): > > > > - All elements are the same. So hashCode() can be called only once. > > - An Iterator is unnecessary. > > > > So, I propose overridding hashCode() implementation for CopiesList: > > > > @Override > > public int hashCode() { > > int hashCode = 1; > > final int elementHashCode = (element == null) ? 0 : element.hashCode(); > > for (int i = 0; i < n; i++) { > > hashCode = 31*hashCode + elementHashCode; > > } > > return hashCode; > > } > > > > Benchmark: > > List<List<String>> list = Collections.nCopies(10_000, new > > ArrayList<>(Collections.nCopies(1_000_000, "a"))); > > long nano = System.nanoTime(); > > System.out.println(list.hashCode()); > > System.out.println((System.nanoTime() - nano) / 1_000_000); > > > > Result: > > Old version - ~12 seconds. > > New version - ~10 milliseconds. > >