Re: GMP 6 the incomatible GMP?

2013-01-09 Thread Torbjorn Granlund
Most people seem to prefer splitting the alloc data over a field in the
structure and a field before mp_d[0], while I think that'd be slower as
well as more complex than the "floating-point" coding I have in mind.

My proposal will make allocations somewhat inaccurate, but I'll make
sure it is never more that one percent of the allocation.

I'll try my proposal at some point, to evaluate its effects.  I might
need to introduce a few more macros in the process, which should then
make it easy to try various other alloc representeation styles.

-- 
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: GMP 6 the incomatible GMP?

2013-01-09 Thread Torbjorn Granlund
Marc Glisse  writes:

  Experimentally (for me, others can have wildly different use cases),
  what really helps for small nums isn't to save space, it is to not
  allocate at all, for instance storing a few limbs in the base type and
  only allocating when that's not enough (recycling storage can work
  even better, but is for very specific uses and may not make sense for
  a generic library). There are plans to add a bignum type to the C++
  standard, and people are discussing whether it should be part of the
  specification that constructing small numbers (not even just 0)
  doesn't use dynamic allocation. Now the target users of this bignum
  type and GMP are not exactly the same (bignum would be used a lot by
  people who only need int but want to be safe), so it makes sense if
  the best solutions differ.

I am sure this is often a god idea, perhaps it would be good also in
GMP.

My main reason for not originally doing it like that is code complexity.
Having three operands (thing mpz_add) each with two possible
representations, will give some alternative execution paths.

Another problem is mpz_t size.  It is currently 16 bytes on 64-bit
machines (and 12 bytes on on 32-bit machines).  Even if only one bit of
these bytes are used for typing, we ain't getting much headroom from the
remaining 127 and 95 bits, respectively.  Making the structures larger
is to costly.

-- 
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: GMP 6 the incomatible GMP?

2013-01-08 Thread David Miller
From: Marc Glisse 
Date: Wed, 9 Jan 2013 00:14:15 +0100 (CET)

> PS: thanks a lot for your sparc assembly work!

No problem, it's actually fun, like solving sudoku puzzles, for me.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: GMP 6 the incomatible GMP?

2013-01-08 Thread Marc Glisse

On Tue, 8 Jan 2013, David Miller wrote:


From: Torbjorn Granlund 
Date: Tue, 08 Jan 2013 12:50:01 +0100


Marc Glisse  writes:

  Then we might as well always put it in _mp_d[-1], no? IIRC that's what
  mpfr does.

I see some problems with that:

* There is a serial memory dependency problem, making the allocation
  check at least one more cache latency away.  I don't think OoO
  execution will be able to hide that, as this is used.


My intuition would be that this won't matter, but you have more experience 
than me with that.



I don't know how amicable you are to an idea like this, but if user
applications can only treat pointers to these objects as opaque then
you can encode information in the lowest bits of the pointer.

For example, you could encode in the lowest bit whether the _mp_d[-1]
thing is being done or not.

Then it's at worst a mis-predicted branch in the allocation check.


That seems similar (not saying one is better than the other) to what Niels 
was proposing, where he was encoding it as a special value of the bitfield 
_mp_alloc (so basically in a few bits of _mp_size) instead of _mp_d.


What is done by bignum libraries in many languages is that the base object 
is a single pointer, and depending on the lowest (?) bit it is interpreted 
either as a pointer to (alloc+)size+limb[] or as a small integer. But 
again, their intended main use is with numbers that would fit in an int.


--
Marc Glisse

PS: thanks a lot for your sparc assembly work!
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: GMP 6 the incomatible GMP?

2013-01-08 Thread David Miller
From: Torbjorn Granlund 
Date: Tue, 08 Jan 2013 12:50:01 +0100

> Marc Glisse  writes:
> 
>   Then we might as well always put it in _mp_d[-1], no? IIRC that's what
>   mpfr does.
> 
> I see some problems with that:
> 
> * There is a serial memory dependency problem, making the allocation
>   check at least one more cache latency away.  I don't think OoO
>   execution will be able to hide that, as this is used.

I don't know how amicable you are to an idea like this, but if user
applications can only treat pointers to these objects as opaque then
you can encode information in the lowest bits of the pointer.

For example, you could encode in the lowest bit whether the _mp_d[-1]
thing is being done or not.

Then it's at worst a mis-predicted branch in the allocation check.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: GMP 6 the incomatible GMP?

2013-01-08 Thread Niels Möller
Torbjorn Granlund  writes:

> * There is a serial memory dependency problem, making the allocation
>   check at least one more cache latency away.  I don't think OoO
>   execution will be able to hide that, as this is used.

And that should matter for small bignums only, right, if we care about
relative overhead? Then, it's desirable to have *small* allocs directly
in the mpz_t struct.

> * There is a slight type error mixing limbs and sizes.  The ASL patch by
>   Per Olofsson will require many limbs to represent a size.

I think this can be solved in a type safe way, no matter which of
mp_limb_T and mp_size_t is the larger type. One would need to use
something more complicated than _mp_d[-1] (and this shouold be hidden in
some macro anyay), involving something like

struct mp_storage
{
  mp_size_t alloc;
  mp_limb_t limbs[1]; /* Variable size */
};

Use this when allocating, pass around pointers to the limbs member, and
to access the alloc field, do some casting and a subtraction of
offsetof(struct mp_storage, limbs).

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: GMP 6 the incomatible GMP?

2013-01-08 Thread Torbjorn Granlund
Marc Glisse  writes:

  Then we might as well always put it in _mp_d[-1], no? IIRC that's what
  mpfr does.

I see some problems with that:

* There is a serial memory dependency problem, making the allocation
  check at least one more cache latency away.  I don't think OoO
  execution will be able to hide that, as this is used.

* There is a slight type error mixing limbs and sizes.  The ASL patch by
  Per Olofsson will require many limbs to represent a size.  But this is
  a testing feature, which should not get in they way of an otherwise
  desirable change.

-- 
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: GMP 6 the incomatible GMP?

2013-01-08 Thread Richard Biener
On Tue, 8 Jan 2013, Torbjorn Granlund wrote:

> ni...@lysator.liu.se (Niels M?ller) writes:
> 
>   Hmm. I guess one could go down to only 8 bits or so for the alloc field,
>   and for larger allocations, store the number of allocated limbs at the
>   head of the limb array (in the case limb size is not artificially small,
>   it could be just _mp_d[-1]).
>   
> That's another idea.
> 
>   Do you think 48 bits (or 47 bits absolute value) for _mp_size will be
>   large enough for the foreseeable future (say, 10 years)?
>   
> I think that will be sufficient for some hundred years, since any
> super-linear algorithm applied to an operand of 2^53 bits (2^47 64-bit
> limbs) would hardly ever terminate even with a multi-THz CPU.
> 
>   There are also other tweaks discussed for mpz_t, to allow initialization
>   without allocation.
> 
> Yep, we should do that too.  I cannot recall the details, but I assume
> that this would not create an incompatibility?

Another idea is to embed limbs for small numbers into __mpz_struct itself
(I was playing with that idea to reduce the number of allocations done
by ISL used by GCC now).  Sort of have a magic _mp_alloc number (zero?)
that tells GMP that everything after _mp_alloc are the actual limbs.
Thus have

typedef struct
{
  int _mp_alloc;
  int _mp_size;
  union {
mp_libm_t *_mp_d;
mp_libm_t embed_limbs[N];
  };
} __mpz_struct;

with suitable N (what's a suitable number of limbs to special-case,
given that limb-size is target dependent?  One limb would be easiest
(just use the space of the _mp_d pointer), extra "half" limbs would be
ugly I suppose.

ISL has lots of arrays of __mpz_structs which seem to be often
copied around and the actual numbers are usually small (would fit
into 64bits).

Doing this wouldn't create an incompatibility I think, but if you
re-org things anyway maybe there is a better opportunity to
layout things.

Thanks,
Richard.

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: GMP 6 the incomatible GMP?

2013-01-08 Thread Marc Glisse

On Tue, 8 Jan 2013, Niels Möller wrote:


Marc Glisse  writes:


Then we might as well always put it in _mp_d[-1], no? IIRC that's what
mpfr does.


That will increase overhead (both allocation and processing, I think)
for small bignums, which may be undesirable. But I don't really know how
significant that is.


I am not sure about processing. The allocated size is hidden behind the 
pointer, but you are already dereferencing that pointer anyway, and you 
are saving on the bit manipulations to separate size from alloc. It does 
indeed waste a bit of heap space (not sure that's significant), which can 
also increase the processing in malloc/free.


Experimentally (for me, others can have wildly different use cases), what 
really helps for small nums isn't to save space, it is to not allocate at 
all, for instance storing a few limbs in the base type and only allocating 
when that's not enough (recycling storage can work even better, but is for 
very specific uses and may not make sense for a generic library). There 
are plans to add a bignum type to the C++ standard, and people are 
discussing whether it should be part of the specification that 
constructing small numbers (not even just 0) doesn't use dynamic 
allocation. Now the target users of this bignum type and GMP are not 
exactly the same (bignum would be used a lot by people who only need int 
but want to be safe), so it makes sense if the best solutions differ.


--
Marc Glisse
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: GMP 6 the incomatible GMP?

2013-01-07 Thread Niels Möller
Marc Glisse  writes:

> Then we might as well always put it in _mp_d[-1], no? IIRC that's what
> mpfr does.

That will increase overhead (both allocation and processing, I think)
for small bignums, which may be undesirable. But I don't really know how
significant that is.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: GMP 6 the incomatible GMP?

2013-01-07 Thread Marc Glisse

On Tue, 8 Jan 2013, Niels Möller wrote:


Torbjorn Granlund  writes:


The idea is to instead make a 64-bit bitfield of the _mp_size and
_mp_alloc fields.  One would give _mp_size about 48 bits, and _mp_alloc
the remaining 16.  The alloc field would lose granularity, but should
code small sizes exactly.


Hmm. I guess one could go down to only 8 bits or so for the alloc field,
and for larger allocations, store the number of allocated limbs at the
head of the limb array (in the case limb size is not artificially small,
it could be just _mp_d[-1]).


Then we might as well always put it in _mp_d[-1], no? IIRC that's what 
mpfr does.


--
Marc Glisse
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: GMP 6 the incomatible GMP?

2013-01-07 Thread Fredrik Johansson
On Tue, Jan 8, 2013 at 12:28 AM, Torbjorn Granlund  wrote:
> ni...@lysator.liu.se (Niels Möller) writes:
>
>   Hmm. I guess one could go down to only 8 bits or so for the alloc field,
>   and for larger allocations, store the number of allocated limbs at the
>   head of the limb array (in the case limb size is not artificially small,
>   it could be just _mp_d[-1]).
>
> That's another idea.
>
>   Do you think 48 bits (or 47 bits absolute value) for _mp_size will be
>   large enough for the foreseeable future (say, 10 years)?
>
> I think that will be sufficient for some hundred years, since any
> super-linear algorithm applied to an operand of 2^53 bits (2^47 64-bit
> limbs) would hardly ever terminate even with a multi-THz CPU.

Well, pi has been computed to 10 trillion digits, which is 2^45 bits.
Extrapolating the records from the last 60 years, a 2^53-bit
multiplication can be expected to occur within 30 years, though
certainly with highly parallel hardware, not a single CPU. Actually,
NTTs of that size ought to be feasible on the current generation of
top supercomputers, in theory at least.

Of course 10+ years from now you could just release GMP 7 and break
backwards compatibility again ;-)

Fredrik
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: GMP 6 the incomatible GMP?

2013-01-07 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes:

  Hmm. I guess one could go down to only 8 bits or so for the alloc field,
  and for larger allocations, store the number of allocated limbs at the
  head of the limb array (in the case limb size is not artificially small,
  it could be just _mp_d[-1]).
  
That's another idea.

  Do you think 48 bits (or 47 bits absolute value) for _mp_size will be
  large enough for the foreseeable future (say, 10 years)?
  
I think that will be sufficient for some hundred years, since any
super-linear algorithm applied to an operand of 2^53 bits (2^47 64-bit
limbs) would hardly ever terminate even with a multi-THz CPU.

  There are also other tweaks discussed for mpz_t, to allow initialization
  without allocation.

Yep, we should do that too.  I cannot recall the details, but I assume
that this would not create an incompatibility?

-- 
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: GMP 6 the incomatible GMP?

2013-01-07 Thread Niels Möller
Torbjorn Granlund  writes:

> The idea is to instead make a 64-bit bitfield of the _mp_size and
> _mp_alloc fields.  One would give _mp_size about 48 bits, and _mp_alloc
> the remaining 16.  The alloc field would lose granularity, but should
> code small sizes exactly.

Hmm. I guess one could go down to only 8 bits or so for the alloc field,
and for larger allocations, store the number of allocated limbs at the
head of the limb array (in the case limb size is not artificially small,
it could be just _mp_d[-1]).

Do you think 48 bits (or 47 bits absolute value) for _mp_size will be
large enough for the foreseeable future (say, 10 years)?

There are also other tweaks discussed for mpz_t, to allow initialization
without allocation.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


GMP 6 the incomatible GMP?

2013-01-07 Thread Torbjorn Granlund
We have discussed, perhaps mostly off-list, to fix several nuisances in
the library that will break binary compatibility.

There is actually a terse page about it (linked from the developer's
corner):  https://gmplib.org/devel/incompatibility.html

I would like to lift the precision limit (for mpf) and size limit (for
mpz, mpq) by changing how mpz_t (etc) works.

We could simply change the int fields to size_t, but that would make the
structure a lot larger.  Programs that need to juggle lots of mostly
small GMP numbers would suffer.

The idea is to instead make a 64-bit bitfield of the _mp_size and
_mp_alloc fields.  One would give _mp_size about 48 bits, and _mp_alloc
the remaining 16.  The alloc field would lose granularity, but should
code small sizes exactly.

I haven't worked out the details, but I think this can be done without
overhead for small allocations (say, up to 255 limbs) and that for
larger sizes the overhead will be negligible compared to the time used
for the computation.  With clever coding, a small result which is about
to be written to an operand which has previously been large enough to
have had its alloc field coded for large sizes, a plain comparison like
needed_alloc < ALLOC(r) would return true.  There will only be false
positives for larger needed allocation, where a reallocation will be
assumed to be needed, but then decoding the _mp_alloc field would
conclude new allocation wasn't really needed.

Why should we consider this?  Well, we today cannot handle operands >
2^37, which is hardly "arithmetic without limitations" on today's
computers.  And the new small primes multiplication code (the latest
version at http://gmplib.org:8000/gcd-nisse/) will boost huge
multiplication enough to make operands of 2^40 bits feasible.

(It will be challening to test GMP for greater sizes, since one operand
of 2^37 is 16 GiB of RAM.)

-- 
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel