[Puppet-dev] Re: A question about numbers and representation

Henrik Lindberg Mon, 01 Sep 2014 15:05:45 -0700

On 2014-01-09 19:15, Trevor Vaughan wrote:

TL;DR; BigInteger/BigDecimal is the "right" thing to do, otherwise cap
at the client/server floor.


I have a few thoughts here:

1) I don't like losing precision in any case so a cap makes sense (maybe)

2) If you do cap, would you not want to cap to the lowest of the client
or server? I.e. if the client is a 32 bit system and the server is a 64
bit system, you'd cap at 32 bits.

There is no need to do that - today's systems handle both 32 and 64 bitvalues just fine - its the max unsigned 64 bit int, and values abovethat, and those that are smaller than -2^63 that causes problems. If youhad such values today, they would not roundtrip through the system.

It does not matter all that much if a 32 bit system has a bit more workto do when adding 64 bit numbers - the main problems are serialization,and storage formats for efficient processing at larger scale (where 64bit systems are indeed used).

3) There may be cases where someone needs higher precision numbers. I
can't think of them off hand, but I can guarantee that they'll happen so
adding BigInteger and BigDecimal are probably a good idea.

I also imagine them being needed - they are needed in variousapplications - just wonder what the need may be in puppet's domain.

(Total sum of diskspace in a report?)

Adding them as explicit types will work fine when we do need them BTW,but it requires a fair amount of work as there are many touchpoints inthe system that has to deal with them.

4) For any fact that is retrieved that has multiple formats, I would
like to see a standard set of a hash for each size so that it is easier
to work with. Sure, right now, I can do variable mangling or post
retrieval math, but it's so very untidy.

disk_size => {
   '/dev/sda' => {
      'B' => 10737418240,
      'kB' => 10485760,
      'MB' => 10240,
      'GB' => 10,
   }
}

But then, how far do you take this? TB, PB? EB........?

We probably have to stop at Geopbyte since we are back at 'G' :-)

- henrik

On Mon, Sep 1, 2014 at 4:54 AM, Henrik Lindberg
<henrik.lindb...@cloudsmith.com <mailto:henrik.lindb...@cloudsmith.com>>
wrote:

Hi,
Recently I have been looking into serialization of various kinds,
and the issue of how we represent and serialize/deserialize numbers
have come up.

TL;DR - I want to specify the max values of integers and floats in
the puppet language for a number of reasons. Skip the background part
to get to "Questions and Proposal" if you are already familiar with
serialization formats, and issues regarding numeric representation.

Background
---
As you may know, Ruby has fluent handling of numbers - if a number
would overflow its current byte-size a larger representation will be
used - i.e. from 32 to 64 to (ruby) BigInteger (unlimited). Floating
point numbers undergo the same transition from 32 to 64 to
BigDecimal (unlimited).

This is very flexible and helpful most of the time, but it creates
problem when serializing / deserializing. Most serialization formats
can simply not deal with > 64 bit values as regular numbers. They may do
horrible things like truncation, or use the max/min value if a value
is too big, or for floating point drastically lose precision.

YAML
- specifies integers to have arbitrary size, but recommends that an
implementation uses its native integer size. The specification says:
"In some languages (such as C), an integer may overflow the native
type's storage capability. A YAML processor may reject such a value
as an error, truncate it with a warning, or find some other manner
to round-trip it. In general, integers representable using 32 binary
digits should safely round-trip through most systems.".
http://www.yaml.org/spec/1.2/__spec..html
<http://www.yaml.org/spec/1.2/spec.html>

For floating point values, only IEEE 32 bit are safe.

In other words; it is unspecified... and means a YAML implementation
may silently truncate numbers to 32 bit values to 32 bit max int
(2,147,483,647) when running on a 32 bit machine (some
implementations as noted as "gotchas" in blog posts (google for it)).

JSON
- is similar to YAML in that it specifies a number to be an
arbitrary number of digits and it is thus up to an implementation to
bind this to a representation. It has the same problems as YAML.
Notably, if used with JavaScript which only has Number for both
Integer and Real, the largest integer number is 2^53 (after which it
starts to lose precision).

MsgPack
- handles 8-16-32-64 bit integers (signed and unsigned) as well as
32 and 64 bit floating point. Does not have built in BigInteger,
BigDecimal types.

The Puppet Language Specification
---
In the Puppet Language Specification the size and precision of
numbers is currently specified as Ruby numbers (simply because this
was easiest). This is sloppy and leaves edge cases for serialization
and storage of data.

Proposal
========
I would like to cap a Puppet Integer to be a 64 signed value when
used as a resource attribute, or anywhere in external formats. This
means a value range of -2^63 to 2^63-1 which is in Exabyte range (1
exabyte = 2^60).

I would like to cap a Puppet Float to be a 64 bit (IEEE 754
binary64) when used as a resource attribute or anywhere in external
formats.

With respect to intermediate results, I propose that we specify that
values are of arbitrary size and that it is an error to store a
value that is to big for the typed representation Integer (64 bit
signed). For Float (64 bit) representation there is no error, but it
looses precision. When specifying an attribute to have Number type,
automatic conversion to Float (with loss of precision) takes place
if an internal integer number is to big for the Integer representation.

(Note, by default, attributes are typed as Any, which means that
they by default would store a Float if the integer value
representation overflows).

Questions
=========
* Is it important that Javascript can be used to (accurately) read
JSON generated by Puppet? (If so, the limit needs to be 2^53 or
values lose precision).

* Is it important in Puppet Manifests to handle values larger than
2^63-1 (smaller than -2^63), and if not so, why isn't it sufficient
to use a floating point value (with reduced precision).

* If you think Puppet needs to handle very large values (yottabyte
sized disks?), should the language have convenient ways of expressing
such values e.g. 42yb ?

* Is it ok to automatically do transformation to floating point if
values overflow, and the type of an attribute is Number? (as
discussed above). I can imagine this making it difficult to
efficiently represent an attribute in a database and support may
vary between different database engines.

* Do you think it is worth the trouble to add the types BigInteger
and BigDecimal to the type system to allow the representation to be
more precise? (Note that this makes it difficult to use standard
number representation in serialization formats). This means that
Number is not allowed as an attribute/storage type (user must choose
Integer, Float, or one of the Big... types).

* Do you think it should work as in Ruby? If so, are you ok with
serialization that is non standard?

- henrik
--

Visit my Blog "Puppet on the Edge"
http://puppet-on-the-edge.__blogspot.se/
<http://puppet-on-the-edge.blogspot.se/>

https://groups.google.com/d/__msgid/puppet-dev/lu1c8m%24a2n%__241%40ger.gmane.org

<https://groups.google.com/d/msgid/puppet-dev/lu1c8m%24a2n%241%40ger.gmane.org>.
For more options, visit https://groups.google.com/d/__optout
<https://groups.google.com/d/optout>.

--
Trevor Vaughan
Vice President, Onyx Point, Inc
(410) 541-6699
tvaug...@onyxpoint.com <mailto:tvaug...@onyxpoint.com>

-- This account not approved for unencrypted proprietary information --

--
You received this message because you are subscribed to the Google
Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to puppet-dev+unsubscr...@googlegroups.com
<mailto:puppet-dev+unsubscr...@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/puppet-dev/CANs%2BFoUWgzwdhhEFtS6STj_POU80dtPpVvrN_dx1Ta13QCjJkQ%40mail.gmail.com
<https://groups.google.com/d/msgid/puppet-dev/CANs%2BFoUWgzwdhhEFtS6STj_POU80dtPpVvrN_dx1Ta13QCjJkQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.



--

Visit my Blog "Puppet on the Edge"
http://puppet-on-the-edge.blogspot.se/

--
You received this message because you are subscribed to the Google Groups "Puppet 
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/puppet-dev/lu2qi1%24e88%241%40ger.gmane.org.
For more options, visit https://groups.google.com/d/optout.

[Puppet-dev] Re: A question about numbers and representation

Reply via email to