Hi,
Recently I have been looking into serialization of various kinds, and
the issue of how we represent and serialize/deserialize numbers have
come up.
TL;DR - I want to specify the max values of integers and floats in the
puppet language for a number of reasons. Skip the background part
to get to "Questions and Proposal" if you are already familiar with
serialization formats, and issues regarding numeric representation.
Background
---
As you may know, Ruby has fluent handling of numbers - if a number would
overflow its current byte-size a larger representation will be used -
i.e. from 32 to 64 to (ruby) BigInteger (unlimited). Floating point
numbers undergo the same transition from 32 to 64 to BigDecimal (unlimited).
This is very flexible and helpful most of the time, but it creates
problem when serializing / deserializing. Most serialization formats
can simply not deal with > 64 bit values as regular numbers. They may do
horrible things like truncation, or use the max/min value if a value is
too big, or for floating point drastically lose precision.
YAML
- specifies integers to have arbitrary size, but recommends that an
implementation uses its native integer size. The specification says:
"In some languages (such as C), an integer may overflow the native
type's storage capability. A YAML processor may reject such a value as
an error, truncate it with a warning, or find some other manner to
round-trip it. In general, integers representable using 32 binary digits
should safely round-trip through most systems.".
http://www.yaml.org/spec/1.2/spec.html
For floating point values, only IEEE 32 bit are safe.
In other words; it is unspecified... and means a YAML implementation may
silently truncate numbers to 32 bit values to 32 bit max int
(2,147,483,647) when running on a 32 bit machine (some implementations
as noted as "gotchas" in blog posts (google for it)).
JSON
- is similar to YAML in that it specifies a number to be an arbitrary
number of digits and it is thus up to an implementation to bind this to
a representation. It has the same problems as YAML. Notably, if used
with JavaScript which only has Number for both Integer and Real, the
largest integer number is 2^53 (after which it starts to lose precision).
MsgPack
- handles 8-16-32-64 bit integers (signed and unsigned) as well as 32
and 64 bit floating point. Does not have built in BigInteger, BigDecimal
types.
The Puppet Language Specification
---
In the Puppet Language Specification the size and precision of numbers
is currently specified as Ruby numbers (simply because this was
easiest). This is sloppy and leaves edge cases for serialization and
storage of data.
Proposal
========
I would like to cap a Puppet Integer to be a 64 signed value when used
as a resource attribute, or anywhere in external formats. This means a
value range of -2^63 to 2^63-1 which is in Exabyte range (1 exabyte = 2^60).
I would like to cap a Puppet Float to be a 64 bit (IEEE 754 binary64)
when used as a resource attribute or anywhere in external formats.
With respect to intermediate results, I propose that we specify that
values are of arbitrary size and that it is an error to store a value
that is to big for the typed representation Integer (64 bit signed). For
Float (64 bit) representation there is no error, but it looses
precision. When specifying an attribute to have Number type, automatic
conversion to Float (with loss of precision) takes place if an internal
integer number is to big for the Integer representation.
(Note, by default, attributes are typed as Any, which means that they by
default would store a Float if the integer value representation overflows).
Questions
=========
* Is it important that Javascript can be used to (accurately) read JSON
generated by Puppet? (If so, the limit needs to be 2^53 or values lose
precision).
* Is it important in Puppet Manifests to handle values larger than
2^63-1 (smaller than -2^63), and if not so, why isn't it sufficient to
use a floating point value (with reduced precision).
* If you think Puppet needs to handle very large values (yottabyte sized
disks?), should the language have convenient ways of expressing
such values e.g. 42yb ?
* Is it ok to automatically do transformation to floating point if
values overflow, and the type of an attribute is Number? (as discussed
above). I can imagine this making it difficult to efficiently represent
an attribute in a database and support may vary between different
database engines.
* Do you think it is worth the trouble to add the types BigInteger and
BigDecimal to the type system to allow the representation to be more
precise? (Note that this makes it difficult to use standard number
representation in serialization formats). This means that Number is not
allowed as an attribute/storage type (user must choose Integer, Float,
or one of the Big... types).
* Do you think it should work as in Ruby? If so, are you ok with
serialization that is non standard?
- henrik
--
Visit my Blog "Puppet on the Edge"
http://puppet-on-the-edge.blogspot.se/
--
You received this message because you are subscribed to the Google Groups "Puppet
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/puppet-dev/lu1c8m%24a2n%241%40ger.gmane.org.
For more options, visit https://groups.google.com/d/optout.