On Tue, Mar 24, 2020 at 9:56 AM Marco Bodrato <bodr...@mail.dm.unipi.it> wrote:
> Il 2020-03-24 01:10 Seth Troisi ha scritto: > > (I'm don't think your ulong patch is needed but I can measure at a > > It's a small patch, but the speed-up it gives is probably too small to > be worth adding. > > > My code doesn't use a sieve (and instead checks each number for > > divisors similar to the old code) because the average gap for small > > numbers is small 300000 / primepi(300000) = 11 > > There is a small error in the code, it does not handle negative numbers > correctly. > I'll write up a patch for add tests for negative numbers tonight > For each divisor your code uses two operations, a product (prime*prime) > and a remainder (t % prime). On many architectures computing t/prime and > t%prime cost just one operation... that's why I'd use the ideas Torbjorn > used in isprime() in dumbmp.c, > https://gmplib.org/repo/gmp/rev/7524222ab7d4 currently in bootstrap.c . > > I propose a variation of your patch, you find it attached, on my > computer it's faster. > Couple of small notes but otherwise this looks good +/* NP_SMALL_LIMIT = prevprime (LAST_PRIME ^ 2) */ In the light of the day this can be bumped slightly to prevprime(nextprime(LAST_PRIME)^2) - 1 = 316960 + /* Technically this should be prev_prime(LAST_PRIME ^ 2) */ I'd remove this, It's covered by the comment above + if (q < prime) + return t; + if (r == 0) + break; Can this be rearranged to + if (r == 0) + break; >+ if (q <= prime) + return t; + ASSERT (i < NUMBER_OF_PRIMES); Should this be placed at the start of the loop or be? + ASSERT (i+1 < NUMBER_OF_PRIMES); > > Because the measured threshold for this was much larger (~80 million), > > it might actually make sense to use a sieve after some threshold > > > If you think that's a 1.5-3x speedup for inputs 16-30bits is important > > I can try to dynamically generate a longer list of primes > > If writing the code is not too complex, it may be interesting to test if > it is worth. > I'll try it out tonight > > > On Mon, Mar 23, 2020 at 2:38 PM Marco Bodrato > > <bodr...@mail.dm.unipi.it> wrote: > > >>> It might also be useful to commit tune_nextprime_small which > >>> dynamically selects a higher reps count for > >>> mpz_nextprime input and helps increase precision. > >> > >> Uhm... > >> j = 200000 / s->size; > > > This is a great point, I modified the code so It only changes the > > outer loop count. > > I included a screenshot showing how much this reduces variance over > > multiple runs. > > Did you try tweaking with the -p parameter? It can be useful when a > measure is "noisy". > Nope I had not, using -p3 seems to work just as well. Thanks :) > > >> PS: a comment in the current code says > >> /* Larger threshold is faster but takes ~O(n/ln(n)) memory. > >> To be honest, the array sieve needs n/24 bytes, and the array > > > You are correct, let's change the comment by which I mean you :) > > Done. > > Ĝis, > m > > -- > http://bodrato.it/papers/ _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel