Hi,
Recently I have been looking into serialization of various kinds, and the issue of how we represent and serialize/deserialize numbers have come up.

TL;DR - I want to specify the max values of integers and floats in the puppet language for a number of reasons. Skip the background part to get to "Questions and Proposal" if you are already familiar with serialization formats, and issues regarding numeric representation.

Background
---
As you may know, Ruby has fluent handling of numbers - if a number would overflow its current byte-size a larger representation will be used - i.e. from 32 to 64 to (ruby) BigInteger (unlimited). Floating point numbers undergo the same transition from 32 to 64 to BigDecimal (unlimited).

This is very flexible and helpful most of the time, but it creates problem when serializing / deserializing. Most serialization formats
can simply not deal with > 64 bit values as regular numbers. They may do
horrible things like truncation, or use the max/min value if a value is too big, or for floating point drastically lose precision.

YAML
- specifies integers to have arbitrary size, but recommends that an implementation uses its native integer size. The specification says: "In some languages (such as C), an integer may overflow the native type's storage capability. A YAML processor may reject such a value as an error, truncate it with a warning, or find some other manner to round-trip it. In general, integers representable using 32 binary digits should safely round-trip through most systems.". http://www.yaml.org/spec/1.2/spec.html

For floating point values, only IEEE 32 bit are safe.

In other words; it is unspecified... and means a YAML implementation may silently truncate numbers to 32 bit values to 32 bit max int (2,147,483,647) when running on a 32 bit machine (some implementations as noted as "gotchas" in blog posts (google for it)).

JSON
- is similar to YAML in that it specifies a number to be an arbitrary number of digits and it is thus up to an implementation to bind this to a representation. It has the same problems as YAML. Notably, if used with JavaScript which only has Number for both Integer and Real, the largest integer number is 2^53 (after which it starts to lose precision).

MsgPack
- handles 8-16-32-64 bit integers (signed and unsigned) as well as 32 and 64 bit floating point. Does not have built in BigInteger, BigDecimal types.

The Puppet Language Specification
---
In the Puppet Language Specification the size and precision of numbers is currently specified as Ruby numbers (simply because this was easiest). This is sloppy and leaves edge cases for serialization and storage of data.

Proposal
========
I would like to cap a Puppet Integer to be a 64 signed value when used as a resource attribute, or anywhere in external formats. This means a value range of -2^63 to 2^63-1 which is in Exabyte range (1 exabyte = 2^60).

I would like to cap a Puppet Float to be a 64 bit (IEEE 754 binary64) when used as a resource attribute or anywhere in external formats.

With respect to intermediate results, I propose that we specify that values are of arbitrary size and that it is an error to store a value that is to big for the typed representation Integer (64 bit signed). For Float (64 bit) representation there is no error, but it looses precision. When specifying an attribute to have Number type, automatic conversion to Float (with loss of precision) takes place if an internal integer number is to big for the Integer representation.

(Note, by default, attributes are typed as Any, which means that they by default would store a Float if the integer value representation overflows).

Questions
=========
* Is it important that Javascript can be used to (accurately) read JSON generated by Puppet? (If so, the limit needs to be 2^53 or values lose precision).

* Is it important in Puppet Manifests to handle values larger than 2^63-1 (smaller than -2^63), and if not so, why isn't it sufficient to use a floating point value (with reduced precision).

* If you think Puppet needs to handle very large values (yottabyte sized disks?), should the language have convenient ways of expressing
such values e.g. 42yb ?

* Is it ok to automatically do transformation to floating point if values overflow, and the type of an attribute is Number? (as discussed above). I can imagine this making it difficult to efficiently represent an attribute in a database and support may vary between different database engines.

* Do you think it is worth the trouble to add the types BigInteger and BigDecimal to the type system to allow the representation to be more precise? (Note that this makes it difficult to use standard number representation in serialization formats). This means that Number is not allowed as an attribute/storage type (user must choose Integer, Float, or one of the Big... types).

* Do you think it should work as in Ruby? If so, are you ok with serialization that is non standard?

- henrik
--

Visit my Blog "Puppet on the Edge"
http://puppet-on-the-edge.blogspot.se/

--
You received this message because you are subscribed to the Google Groups "Puppet 
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/lu1c8m%24a2n%241%40ger.gmane.org.
For more options, visit https://groups.google.com/d/optout.

Reply via email to