On 2014-02-09 22:23, John Bollinger wrote:
On Monday, September 1, 2014 3:55:03 AM UTC-5, henrik lindberg wrote:
Hi,
Recently I have been looking into serialization of various kinds, and
the issue of how we represent and serialize/deserialize numbers have
come up.
[...]
Proposal
========
I would like to cap a Puppet Integer to be a 64 signed value when used
as a resource attribute, or anywhere in external formats. This means a
value range of -2^63 to 2^63-1 which is in Exabyte range (1 exabyte
= 2^60).
I would like to cap a Puppet Float to be a 64 bit (IEEE 754 binary64)
when used as a resource attribute or anywhere in external formats.
With respect to intermediate results, I propose that we specify that
values are of arbitrary size and that it is an error to store a value
What, specifically, does it mean to "store a value"? Does that mean to
assign it to a resource attribute?
It was vague on purpose since I cannot currently enumerate the places
where this should take place, but I was thinking resource attributes at
least.
that is to big for the typed representation Integer (64 bit signed).
For
Float (64 bit) representation there is no error, but it looses
precision.
What about numbers that overflow or underflow a 64-bit Float?
That would also be an error (when it cannot lose more precision).
When specifying an attribute to have Number type, automatic
conversion to Float (with loss of precision) takes place if an internal
integer number is to big for the Integer representation.
(Note, by default, attributes are typed as Any, which means that
they by
default would store a Float if the integer value representation
overflows).
And if BigDecimal (and maybe BigInteger) were added to the type system,
then I presume the expectation would be that over/underflowing Floats
would go there? And maybe that overflowing integers would go there if
necessary to avoid loss of precision?
If we add them, then the runtime should be specified to gracefully
choose the required size while calculating and that the types Any and
Number means that they are accepted, but that Integer and Float does not
accept them (when they have values that are outside the valid range). (I
have not thought this through completely at this point I must say).
Questions
=========
* Is it important that Javascript can be used to (accurately) read JSON
generated by Puppet? (If so, the limit needs to be 2^53 or values lose
precision).
I think that question is moot. No matter what, Javascript is limited in
that it cannot with full fidelity consume or produce Puppet data having
more than 53 bits of numeric precision. I don't think it helps anyone
to project that limitation into Puppet.
* Is it important in Puppet Manifests to handle values larger than
2^63-1 (smaller than -2^63), and if not so, why isn't it sufficient to
use a floating point value (with reduced precision).
I am not prepared to offer examples of why Puppet manifests would need
to handle more than 63 bits of fixed-point precision, nor even more than
53 bits of floating-point precision. I am uneasy about pulling back
from Puppet's documented greater current capabilities, however.
* If you think Puppet needs to handle very large values (yottabyte
sized
disks?), should the language have convenient ways of expressing
such values e.g. 42yb ?
I would prefer to avoid adding such expressions, especially if there
will not be similar ones all the way down the size scale. I would not
be enthusiastic even with a full range of such expressions.
* Is it ok to automatically do transformation to floating point if
values overflow, and the type of an attribute is Number? (as discussed
above). I can imagine this making it difficult to efficiently represent
an attribute in a database and support may vary between different
database engines.
It is not ok to silently lose precision. It might be ok to lose
precision if doing so is accompanied by a warning.
I'd anyway be inclined to say that the problem here is not so much
possible loss of precision as it is specifying the type of the attribute
as Number instead of something more specific. OF COURSE that presents
issues for recording the value in a database.
* Do you think it is worth the trouble to add the types BigInteger and
BigDecimal to the type system to allow the representation to be more
precise? (Note that this makes it difficult to use standard number
representation in serialization formats). This means that Number is not
allowed as an attribute/storage type (user must choose Integer, Float,
or one of the Big... types).
1) If you have BigDecimal then you don't need BigInteger.
True, but BigInteger specifies that a fraction is not allowed.
2) Why would allowing one or both of the Bigs prevent Number from being
allowed as a serializable type?
Not sure I said that. The problem is that if something is potentially
Big... then a database must be prepared to deal with it and it has a
high cost. Specifying that Number means Integer, Float, or a Big type is
perfectly fine.
The way I see it, if you allow Bigs then Numbers must always be
(de)serialized as BigDecimal. Where you want attributes or other values
to be efficiently serializable / indexable / etc. you assign them a
narrower type appropriate for that purpose. If this is too big a
challenge for users accustomed to not specifying types, then perhaps the
whole type system thing -- cool as it is -- is just not a good fit for
Puppet.
Yes, that is how I though this could work. However, since everything is
basically untyped now (which we translate to the type Any), this means
that PuppetDB must be changed to use BigDecimal instead of integer 64
and float. That is a loose^3; it is lots of work to implement, bad
performance, and everyone needs to type everything.
3) Do you actually need one or both Bigs as named types in order to
allow Big values? Could it not be that Big values are representable via
the Number type, but there is no (other) named numeric type that
specifically allows such values? Since you seem to prefer that users to
not work with such values, would that not influence them in that direction?
Possibly. Having Number be concrete and represented as BigDecimal is ok,
it can hold any value described by subclasses.
* Do you think it should work as in Ruby? If so, are you ok with
serialization that is non standard?
I think disallowing Bigs in the serialization formats will present its
own problems, only some of which you have touched on so far. I think
the type system should offer /opportunities/ for greater efficiency in
numeric handling, rather than serving as an excuse to limit numeric
representations.
I don't quite get the point here - the proposed cap is not something
that the type system needs. As an example MsgPack does not have standard
Big types, thus a serialization will need to be special and it is not
possible to just use something like "readInt" to get what you know
should be an integer value. The other example is PuppetDB, where a
decision has to be made how to store integers; the slower Big types, or
a more efficient 64 bit value? This is not just about storage, also
about indexing speed and query/comparisson - and if thinking that some
values are stored as 64 bits and other as big type for the same entity
that would be even slower to query for.
So - idea, make it safe and efficient for the normal cases. Only when
there is a special case (if indeed we do need the big types) then take
the less efficient route.
- henrik
--
Visit my Blog "Puppet on the Edge"
http://puppet-on-the-edge.blogspot.se/
--
You received this message because you are subscribed to the Google Groups "Puppet
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/puppet-dev/lu7jq8%24ag8%241%40ger.gmane.org.
For more options, visit https://groups.google.com/d/optout.