Re: isqrt tweak and factor(1) questions

David Sparks via busybox Wed, 04 Feb 2026 16:42:53 -0800

Thank you for the reply!

On Wednesday, February 4th, 2026 at 22:48, Denys Vlasenko 
<[email protected]> wrote:
> On Tue, Feb 3, 2026 at 3:52 PM David Sparks [email protected] wrote:


> I'll add a separate version just for "unsigned long".

I was thinking, if you want minimum code size, it's fine to share,
just only compile it for "unsigned long long" if needed, with some
CONFIG_ trickery.  CONFIG_LONGER_ISQRT depends on CONFIG_FACTOR,
and bingo, no code duplication.

> Looks good. I don't think it's worth going to asm() tricks
> to make gcc optimize better.
> (It's better to spend time to improve gcc to do it better for everyone
> ...but yes, it's also much harder).

I was just pleased because it's a *completely portable* asm() trick.

> For any even remotely taxing integer to factorize, unpack_wheel()
> takes insignificant time to execute by comparison.

I was thinking more about code size than time.  I agree it's trivial,
but it can still be improved.
 
>> It could
>> easily be merged with the trial division loop, or the wheel could be
>> generated algorithmically without the compressed table.
>
> Adding more code to trial division loop will cause additional
> register pressure (=> slower code).

I should have a look.  Are you worried about speed or size more
here?

If you want an unpacked wheel, I can generate it on the fly with
smaller code than the current packed table, and barely more time.

Basically, have a 3*11 = 33 and 5*7 = 35-bit shift register
(circular right shifts are simple with "x = x << 32 | x >> 1;")
with bits set whenever multiples of 3, 5, 7 or 11 appear.  Then
just step through the values up to 2310+13 = 2323 by 2, recording
the distances between values for which ((mod33 | mod 35) & 1) == 0.

(Which can be implemented branch-free using the "unconditional store
and conditional pointer increment" trick from BlockQuicksort.)

The initial steps <= 13 can be handled by initializing the shift
registers with contents which omit the first occurrence of those
values in the lsbits.

For 32-bit, 3 shift registers of 15, 7 and 11 bits are probably better
than doing 64-bit shifts, but I'll check code size.

(Q: On which platform do you generally run bloat-o-meter?)

> I didn't go for full-blown, "actually smart" code for this,
> since I'm not sure we have users.
> 
> However, if you absolutely itching to implement something
> bigger but faster (or capable of >64bits),

I'm not "absolutely itching", but if you actually *want* one,
it's a SMOP.  Mostly I want to know what your appetite for
size/speed tradeoffs is.

> please do submit a patch.
> It can always be under a CONFIG option if it's "too big"
> to be unconditional.
> 
> Frankly, a serious implementation for this would be rather different,
> and maybe better be a whole separate code, mostly unrelated
> (IOW: not an addition of an ifdef forest to what we have here).

Yes, trial division would be a lot more limited.  

>> I'm not sure what the desired tradeoff here is. Modulo-n arithmetic
>> requires widening multiply for intermediate values, which is both slow
>> and a lot of code to emulate if the platform doesn't have it. Maybe
>> I should have a look at the arbitrary-precision code in dc(1)?
> 
> That one is decimal-based (-> slow).
> 
> TLS code has some arbitrary-precision binary code.

Ooh!  I should hack on that!  I notice that it lacks __int128
support, and that could be a big speed boost.  (Compiler support
can be tested with #ifdef __SIZEOF_INT128__.)

> I don't think so.
> 
> factorize(N) calls isqrt_odd(N) before factorizing.
> isqrt_odd(N) will detect that N=p^2
> (by discovering that isqrt(N)^2 = N)
> and will factorize p once (will just print its factors twice).
> 
> Tell me what I'm missing.

I screwed up my example.  If N = p^3, *then* it'll divide by p,
notice that p^2 is square, and start trial dividing again.

The most wasteful case is N = p^2 * q, where q ~ sqrt(p), which
will trial divide up to q twice.  ("Worst case" is a bit more
difficult to define.)

_______________________________________________
busybox mailing list
[email protected]
https://lists.busybox.net/mailman/listinfo/busybox

Re: isqrt tweak and factor(1) questions

Reply via email to