On 2014-04-09 24:35, John Bollinger wrote:
On Wednesday, September 3, 2014 12:40:45 PM UTC-5, henrik lindberg wrote:
On 2014-02-09 22:23, John Bollinger wrote:
>
>
> On Monday, September 1, 2014 3:55:03 AM UTC-5, henrik lindberg
wrote:
>
> Hi,
> Recently I have been looking into serialization of various
kinds, and
> the issue of how we represent and serialize/deserialize
numbers have
> come up.
>
>
> [...]
>
>
> Proposal
> ========
> I would like to cap a Puppet Integer to be a 64 signed value
when used
> as a resource attribute, or anywhere in external formats.
This means a
> value range of -2^63 to 2^63-1 which is in Exabyte range (1
exabyte
> = 2^60).
>
> I would like to cap a Puppet Float to be a 64 bit (IEEE 754
binary64)
> when used as a resource attribute or anywhere in external
formats.
>
> With respect to intermediate results, I propose that we
specify that
> values are of arbitrary size and that it is an error to store
a value
>
>
>
> What, specifically, does it mean to "store a value"? Does that
mean to
> assign it to a resource attribute?
It was vague on purpose since I cannot currently enumerate the places
where this should take place, but I was thinking resource attributes at
least.
Surely there is a medium between "vague" and "enumerating all
possibilities". Or in the alternative, a minimum set of places where
Big values must be allowed could be given. Otherwise the proposal is
insufficiently defined to reason about, much less implement.
Sorry. At this point I am interested in feedback; do we really need Big
types, how do you deal with it today. Thoughts on where it is important,
edges where values have to be represented in some other form than a live
Ruby object etc.
Two places are naturally in catalogs, and facts, but we have many other
uses of the data that is collected - so, I can't enumerate them. I am
thinking anything that is the result of evaluation of puppet logic and
being stored or sent somewhere.
>
> that is to big for the typed representation Integer (64 bit
signed).
> For
> Float (64 bit) representation there is no error, but it looses
> precision.
>
>
>
> What about numbers that overflow or underflow a 64-bit Float?
>
That would also be an error (when it cannot lose more precision).
IEEE floating-point underflow occurs not when a number cannot lose more
precision, but rather when it is nonzero but so small that it does not
have a normalized representation in the chosen floating-point format.
Among IEEE 64-bit doubles, these are nonzero numbers having absolute
value less than 2^-1022. Almost all such subnormal representations
/can/ lose more precision in the sense that there are even less precise
subnormals, but they already have less precision than is usual for the
format.
Thanks, I am really not an expert on floating point representation.
Clearly, I am using the wrong terms.
> When specifying an attribute to have Number type, automatic
> conversion to Float (with loss of precision) takes place
This happens only when the value is "stored", I presume?
if an internal
> integer number is to big for the Integer representation.
>
> (Note, by default, attributes are typed as Any, which means that
> they by
> default would store a Float if the integer value representation
> overflows).
>
>
>
> And if BigDecimal (and maybe BigInteger) were added to the type
system,
> then I presume the expectation would be that over/underflowing
Floats
> would go there? And maybe that overflowing integers would go
there if
> necessary to avoid loss of precision?
>
If we add them, then the runtime should be specified to gracefully
choose the required size while calculating
I thought the whole reason for the proposal and discussion was that Ruby
already does handle these gracefully, hence Puppet already has Big values.
Yes, the Puppet Runtime has this since it is currently written in Ruby.
The specification is not tied to a particular implementation. If you
write a C, Java, or Haskel implementation; what is it required to do...
and that the types Any and
Number means that they are accepted, but that Integer and Float does
not
accept them (when they have values that are outside the valid
range). (I
have not thought this through completely at this point I must say).
Clarification: I have no objection to limiting the values allowed for
types Integer and Float, as specified in the proposal. What I am
concerned about is Puppet pulling back from supporting the full range of
numeric values it supports now (near-arbitrary range and precision).
The problem is that it does not really support them. It is not
specified, it is not tested, and you cannot roundtrip such values past
PuppetDB nor serialize catalogs with them with MsgPack.
>
> 1) If you have BigDecimal then you don't need BigInteger.
>
True, but BigInteger specifies that a fraction is not allowed.
Supposing that I persuaded you that the type system should include a
BigDecimal type in some form, I would be completely satisfied to leave
it to you to decide whether it should also include a BigInteger type.
:-)
> 2) Why would allowing one or both of the Bigs prevent Number from
being
> allowed as a serializable type?
>
Not sure I said that. The problem is that if something is potentially
Big... then a database must be prepared to deal with it and it has a
high cost.
/Every/ Puppet value is potentially a Big /now/. What new cost is
involved? I'm having trouble seeing how a database can deal efficiently
with Puppet's current implicit typing anyway, Big values
notwithstanding. Without additional type information, it must be
prepared for any given value to be a boolean, an Integer, a float, or a
37kb string (among other possibilities). Why do Big values present an
especial problem in that regard?
I leave that one for Ken Barber, as I am not 100% sure on the design in
Puppet DB.
since everything is
basically untyped now (which we translate to the type Any), this means
that PuppetDB must be changed to use BigDecimal instead of integer 64
and float. That is a loose^3; it is lots of work to implement, bad
performance, and everyone needs to type everything.
Well, /some/one needs to type everything, somehow. Typing is an
inherent aspect of any representation of any value. Indeed, it is loose
to call Puppet values "untyped"; they are definitely typed (try
inline_template('<%= type($my_variable) >') some time), but the type is
not necessarily known /a priori/. It is also loose to call Puppet 3
expressions "untyped" -- it is more precise to say that expressions,
including variable dereferences, are implicitly typed.
But yes, for efficient numeric storage representations to be used, the
types of the values to be stored must be among those for which an
efficient representation is available. MOREOVER, unless the storage
mechanism is prepared to adapt dynamically to the types of the values
presented to it, the specific types of those values must be known in
advance, and they must be consistent. In that sense everyone *does*
need to type everything, regardless of whether any Big types are among
the possibilities.
If you do suppose a type-adaptive storage mechanism (so that people
don't need to type everything) then the mere possibility of Big values
does not impose any additional inefficiency. The actual appearance of
Big values might be costly, but if such a value is in fact presented for
storage then is it not better to faithfully store it than to fail?
> I think disallowing Bigs in the serialization formats will
present its
> own problems, only some of which you have touched on so far. I
think
> the type system should offer /opportunities/ for greater
efficiency in
> numeric handling, rather than serving as an excuse to limit numeric
> representations.
>
I don't quite get the point here - the proposed cap is not something
that the type system needs. As an example MsgPack does not have
standard
Big types, thus a serialization will need to be special and it is not
possible to just use something like "readInt" to get what you know
should be an integer value. The other example is PuppetDB, where a
decision has to be made how to store integers; the slower Big types, or
a more efficient 64 bit value? This is not just about storage, also
about indexing speed and query/comparisson - and if thinking that some
values are stored as 64 bits and other as big type for the same entity
that would be even slower to query for.
As I said already, I am not against the proposed caps. Rather, I am
urging you to not categorically forbid serialization of Big values.
Possibly you could allow serialization to some formats -- such as
MsgPack -- to fail on Bigs, but it's not clear to me even in that case
why failure/nothing is better than something. The issue is the data,
not the format -- Puppet (currently) supports Big values, so if it needs
to serialize values then it needs to serialize Bigs.
As for PuppetDB in particular, you have storage, indexing, and
comparison problems (or else fidelity problems) for any number that is
not specifically typed Integer or Float. Number is not specific enough,
even without Bigs, and Any certainly isn't. If PuppetDB is
type-adaptive then Bigs shouldn't present any special problem. If it
isn't, then it needs explicit typing (as Integer or Float) for
efficiency anyway, so Bigs shouldn't present any special problem.
So - idea, make it safe and efficient for the normal cases. Only when
there is a special case (if indeed we do need the big types) then take
the less efficient route.
Ok, but I don't see how making it an error to "store" a Big value serves
that principle. Safe and efficient for storage of Integers and Floats
allows use of native numeric formats; safe and efficient storage for
Number or Any does not (even if storing a Big were an error). Numbers
that can be represented only as Bigs will not be typed Integer or
Float. If such numbers (or such formal types) constitute a special case
then fine, but let there be a "less efficient route" (with full
fidelity) for that.
John, thanks for all the thoughts on this topic. There are many valid
points and things to take into consideration. Will have another go with
Ken Barber on how certain things work in Puppet DB.
Plan to come back with something that is a more coherent and complete
proposal :-)
Cheers
- henrik
John
--
You received this message because you are subscribed to the Google
Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to puppet-dev+unsubscr...@googlegroups.com
<mailto:puppet-dev+unsubscr...@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/puppet-dev/60117603-35dc-4160-bd97-600eeb5bad63%40googlegroups.com
<https://groups.google.com/d/msgid/puppet-dev/60117603-35dc-4160-bd97-600eeb5bad63%40googlegroups.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.
--
Visit my Blog "Puppet on the Edge"
http://puppet-on-the-edge.blogspot.se/
--
You received this message because you are subscribed to the Google Groups "Puppet
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/puppet-dev/lu8c7e%24374%241%40ger.gmane.org.
For more options, visit https://groups.google.com/d/optout.