> If n == 1, then it would become `mask = n << 32`, and the loop would run 32 times. Forget my implementation. It is incorrect at all. In only works for odd numbers :(
пт, 30 нояб. 2018 г. в 13:58, Ivan Gerasimov <ivan.gerasi...@oracle.com>: > Hi Zheka and Tagir! > > > On 11/29/18 10:37 PM, Zheka Kozlov wrote: > > Thanks, Tagir! > > > > I was also thinking of how to calculate hashCode quickly but my direction > > was wrong. I thought that we can use the formula of the sum of a > geometric > > progression: Sum(p^k, k = 0..n) = (1-p^n)/(1-p). Unfortunately, this > > involves division which doesn't work with the overflow of integers. I > > didn't know the trick p^(2n) = (p^n)^2. > > > > Your formulas and implementation look correct. > > > > I would probably rewrite the loop to make it a bit simpler: > > for (int mask = n << (Integer.numberOfLeadingZeros(n) + 1); mask != 0; > mask > > <<= 1) { > > If n == 1, then it would become `mask = n << 32`, and the loop would run > 32 times. > > > The check > if (((n << i) & 0x8000_0000) != 0) { > might be written as > if ((n << i) < 0 ) { > to save one bit-wise operation and avoid using extra constant. > > > With kind regards, > Ivan > > > sum = sum * (pow + 1); > > pow *= pow; > > if ((mask & 0x8000_0000) != 0) { > > pow *= 31; > > sum = sum * 31 + 1; > > } > > } > > > > But that's just a matter of style. > > > > Great job! > > > > пт, 30 нояб. 2018 г. в 11:02, Tagir Valeev <amae...@gmail.com>: > > > >> Hello! > >> > >> If you are doing it fast, why not doing it really fast? If you > >> deparenthesize and regroup terms, you'll got > >> h(e, n) = p ^ n + e * f(n) > >> Where h(e, n) is the hashCode of n elements with hashCode of single > >> element = e; p = 31 and > >> f(n) = Sum(p^k, k = 0..n-1) > >> > >> Using simple algebraic rules, you'll get: > >> p ^ (2n) = (p^n)^2 > >> f(2n) = f(n) * (p^n + 1) > >> p ^ (n+1) = (p^n)*p > >> f(n+1) = f(n) * p + 1 > >> > >> Thus the algorithm may look as follows: > >> > >> public int hashCode() { > >> int pow = 1; // -> 31^n > >> int sum = 0; // -> Sum(31^k, k = 0..n-1) > >> for (int i = Integer.numberOfLeadingZeros(n); i < Integer.SIZE; > i++) { > >> sum = sum * (pow + 1); > >> pow *= pow; > >> if (((n << i) & 0x8000_0000) != 0) { > >> pow *= 31; > >> sum = sum * 31 + 1; > >> } > >> } > >> return pow + sum * (element == null ? 0 : element.hashCode()); > >> } > >> > >> It seems reasonable to peel off the n = 0 case, which helps to reduce > >> number of multiplications for other cases (also for n = 0 we don't > >> need to calculate element hashCode at all): > >> > >> public int hashCode() { > >> if (n == 0) return 1; > >> int pow = 31; // -> 31^n > >> int sum = 1; // -> Sum(31^k, k = 0..n-1) > >> for (int i = Integer.numberOfLeadingZeros(n) + 1; i < Integer.SIZE; > >> i++) { > >> sum = sum * (pow + 1); > >> pow *= pow; > >> if (((n << i) & 0x8000_0000) != 0) { > >> pow *= 31; > >> sum = sum * 31 + 1; > >> } > >> } > >> return pow + sum * (element == null ? 0 : element.hashCode()); > >> } > >> > >> Assuming that element hashCode is simple (I used String "foo" as an > >> element), I got the following results for different collection sizes: > >> > >> Benchmark (size) Score Error Units > >> hashCodeFast 0 2,299 ± 0,017 ns/op > >> hashCodeFast 1 2,731 ± 0,021 ns/op > >> hashCodeFast 2 4,073 ± 0,077 ns/op > >> hashCodeFast 3 4,315 ± 0,032 ns/op > >> hashCodeFast 5 5,470 ± 0,074 ns/op > >> hashCodeFast 10 6,904 ± 0,060 ns/op > >> hashCodeFast 30 9,102 ± 0,173 ns/op > >> hashCodeFast 100 10,093 ± 0,069 ns/op > >> hashCodeFast 1000 14,129 ± 0,074 ns/op > >> hashCodeFast 10000 17,028 ± 0,249 ns/op > >> hashCodeFast 100000 20,795 ± 0,194 ns/op > >> hashCodeFast 1000000 23,622 ± 0,264 ns/op > >> > >> Compared to Zheka's implementation: > >> > >> Benchmark (size) Score Error Units > >> hashCodeZheka 0 2,584 ± 0,024 ns/op > >> hashCodeZheka 1 2,868 ± 0,022 ns/op > >> hashCodeZheka 2 3,730 ± 0,030 ns/op > >> hashCodeZheka 3 4,323 ± 0,027 ns/op > >> hashCodeZheka 5 5,285 ± 0,037 ns/op > >> hashCodeZheka 10 8,254 ± 0,057 ns/op > >> hashCodeZheka 30 24,793 ± 0,218 ns/op > >> hashCodeZheka 100 89,017 ± 0,764 ns/op > >> hashCodeZheka 1000 923,792 ± 28,194 ns/op > >> hashCodeZheka 10000 9157,411 ± 98,902 ns/op > >> hashCodeZheka 100000 91705,599 ± 689,299 ns/op > >> hashCodeZheka 1000000 919723,545 ± 13092,935 ns/op > >> > >> So results are quite similar for one-digit counts, but we start > >> winning from n = 10 and after that logarithmic algorithm really rocks. > >> > >> I can file an issue and create a webrev, but I still need a sponsor > >> and review for such change. Martin, can you help with this? > >> > >> With best regards, > >> Tagir Valeev. > >> On Tue, Nov 27, 2018 at 5:49 PM Martin Buchholz <marti...@google.com> > >> wrote: > >>> I agree! > >>> > >>> (but don't have time ...) > >>> > >>> On Sun, Nov 25, 2018 at 9:01 PM, Zheka Kozlov <orionllm...@gmail.com> > >> wrote: > >>>> Currently, CopiesList.hashCode() is inherited from AbstractList which: > >>>> > >>>> - calls hashCode() for each element, > >>>> - creates a new Iterator every time. > >>>> > >>>> However, for Collections.nCopies(): > >>>> > >>>> - All elements are the same. So hashCode() can be called only > once. > >>>> - An Iterator is unnecessary. > >>>> > >>>> So, I propose overridding hashCode() implementation for CopiesList: > >>>> > >>>> @Override > >>>> public int hashCode() { > >>>> int hashCode = 1; > >>>> final int elementHashCode = (element == null) ? 0 : > >> element.hashCode(); > >>>> for (int i = 0; i < n; i++) { > >>>> hashCode = 31*hashCode + elementHashCode; > >>>> } > >>>> return hashCode; > >>>> } > >>>> > >>>> Benchmark: > >>>> List<List<String>> list = Collections.nCopies(10_000, new > >>>> ArrayList<>(Collections.nCopies(1_000_000, "a"))); > >>>> long nano = System.nanoTime(); > >>>> System.out.println(list.hashCode()); > >>>> System.out.println((System.nanoTime() - nano) / 1_000_000); > >>>> > >>>> Result: > >>>> Old version - ~12 seconds. > >>>> New version - ~10 milliseconds. > >>>> > > -- > With kind regards, > Ivan Gerasimov > >