I have a long and rambling response written in several installments
between meetings - so I apologize if it is not completely consistent...
To summarize the proposal for 4x (starting out with the cleanest most
strict to see what this means in practice):
1. unqualified variable references is "what can be seen in this
evaluation scope, outer evaluation scopes, and then global"
2. qualified variable references are absolute
3. definition of classes and defines with relative names are named
relative to the namespace it is in
4. resolution of qualified names used as references (for include etc)
are absolute
The rationale for 2, 3, 4 is that this is faster, most user seem to not
trust relative name resolution and throw in :: everywhere (for both
sanity and speed). For those that actually understand how it currently
works and actually wants relative name resolution it does mean a bit
more characters to type with a small loss in ease of refactoring (moving
a class means it can resolve against other classes than before - both a
blessing and a curse).
We are contemplating having an alias feature.
I then ramble on about scope and try to answer questions...
Regards
- henrik
On 2014-14-03 23:07, John Bollinger wrote:
On Thursday, March 13, 2014 7:19:22 PM UTC-5, henrik lindberg wrote:
We also have to decide if any of the relative name-space functionality
should remain (i.e. reference to x::y is relative to potentially a
series of
other name spaces ("dynamic scoping"), or if it is always a global
reference when it is qualified.
Can you choose a different term than "dynamic scoping" for what you're
describing there? It's not consistent with other uses of that term with
which I am familiar, and historically "dynamic scoping" has meant
something different in Puppet.
Not quite sure what the correct term is in current Puppet - I mean "the
various ways current puppet resolves a name such as x::y".
The implementation idea we have in mind is that there is one global
scope where all "qualified variables" are found/can be resolved, and
that all other variables are in local scopes that nest. (Local scopes
include ephemeral scopes for match variables).
Given the numbers from measuring the read ratio, we (sort of already
know, but still need to measure) need a fast route from any scope to
the
global - we know that a qualified variable is never resolved by any
local scope so we can go straight to the global scope. (This way
we do not have to traverse the chain up to the "parent most" scope (the
global one).
I think that's fine, as long as it's consistent, but it has the
potential to present oddities. For example, in the body of class m, can
one declare class ::m::a via its unqualified name (e.g. "include 'a'")?
If so, then should one not also from the same scope be able to refer to
the variables of ::m::a via relative names ($a::foo)?
There are several concepts at play:
* the name given to a class or user defined resource type
* the loading of it
* referencing a variable
Class naming
Currently a class or define gets a name in the name space where it is
defined - its (possibly qualified name) is appended to the name where it
is defined. Thus:
class a {
class b {
}
class x::y {
}
}
Creates the three classes ::a, ::a::b, and ::a::x::y
This construct is exactly the same as if they were defined like this:
class a {
}
class a::b {
}
class a::x::y {
}
The fact that a class is defined inside of another does not give it any
special privileges (reading private content inside the class it is
defined in, etc.). This is a naming operation only.
Likewise, when a class is included, this inclusion is in some arbitrary
namespace and it currently searches for the relative name. The
suggestion is to not do this, and instead require a fully qualified
(absolute) name.
include a
include b::a
include a::x::y
Sidebar:
| (I use the term "define" to mean "defining what a named entity is" as
| opposed to the term "declare" which is a term that denotes a
| definition of the existence of something (typically having some given
| type) - e.g. "int c" in the C language, which declares c, but does
| not define it.
|
| (Just saying since the use of define / declare may confuse someone)
I see two main reasonable alternatives:
* class names are always treated as absolute. Class ::m can declare
class ::m::a only via its qualified name, and $a::foo is always
equivalent to $::a::foo.
If you mean that class ::m::a can be defined inside of class ::m, we do
that either by:
a) as today, class gets its (relative) name concatenated on to the
containing class' name, otherwise its absolute name.
b) the name is always absolute, a class a {} inside a class m {} gets
the fully qualified name ::a
c) enforce that they are always named with a starting :: to be able to
flag down all relative names.
d) forbid that a nested class is given an absolute name
Of these I prefer a) since it causes the least breakage and surprise.
(Side note, the idea that nesting classes should not be allowed has
been raised as well - to further break the illusion that they have
some privileged relation to each other - they are not "inner classes" as
in Java or anything like that, they are not protected/private in any way
- the are just named after where they are defined).
* class names can be expressed absolutely or relative to the innermost
enclosing class scope (~ the current namespace), only, both for
class declaration and for variable lookup. Class ::m can declare
class ::m::a via its unqualified name, and can refer to the
variables of ::m::a via relative names ($a::var).
This is like a) from naming the class, but keeps relative resolution of
references. Maybe it is a really bad idea to remove this ability - but
it is what opens up the can of worms... is it also relative to the name
space of the super class? is it relative to any outer name space? to any
outer name space of an inherited class?
Either approach provides consistency in that any way it is permissible
to refer to a class itself, it is also permissible to refer to that
class's variables by appending '::varname'. Note that the latter does
not require traversing the chain of enclosing scopes, nor looking up
names directly in any local scope; rather, it could be implemented as
maximum two lookups against the global scope.
Well, it is not really possible to refer to a class with a variable
it does not evaluate to an instance of class, it may evaluate to a
variable in another namespace though... (this is also confusing)
What the proposed (strict) rules means:
class a {
class b {
$x = 1
}
class c inherits b {
$y = $x + 10
}
}
The resolution of $x will lookup the $x in local scope representing the
class a::c, fail, and then in its parent scope representing a::b, and
there find x.
If instead
class c inherits b {
$y = $b::x + 10
}
was used, it would immediately go to the global scope and resolve b::x
and find the value 10.
now, if we instead treats b::x as a relative reference to the name space
it is used in - then it may be a reference to:
* ::a::c::x
* ::a::b::x
* ::b::x
What if b also inherits? What if the namespaces are more deeply nested?
$x = 0
class aa {
$x = 1
class a {
$x = 2
class b {
}
class c inherits b {
$y = $x
$z = $b::x
}
}
}
class b {
$x = 4
}
What is $aa::a::c::y and $aa::a::c::z ?
In the proposal, the $y evaluates to 0 (there is no x in c, nor in b,
they do not see into aa, and can not see into aa::b). And $z evaluates
to :undef, since there is no x in b.
In 3.x the value of $aa::a::c::z becomes 0, since when it reaches class
aa::a::b and it does not have an x, then x resolves the global x. (Jīng!
<- Chinese Surprise).
With relative naming the search is done in this order
* aa::a::c::b::x
* aa::a::b::x
* aa::b::x
* b::x
(if we remove the 3x surprising behavior to resolve to global x when
there is no x in b by setting $x in b) and move the b class around
between the various namespaces it is possible to verify that it searches
in the order above.
We can do that in 4x as well (sans the Jīng!) if we come to the
conclusion that that would be the best. (i.e. worst case 4 hash lookups
for a 3 level nesting of names). We cannot really optimize this - the
names have to be tried in that given order.
Making it strict means that there is only one lookup, but the c class
would have to be written like this:
class c inherits b {
$y = $x
$z = $aa::a::b::x
}
if we insist on making a qualified reference to the x in b (a $x gets
the same result).
We could make the inherited class have special status - and thus resolve
against it - but not sure if it is worth doing this.
I expect that we will retain the ability to refer to variables via their
unqualified names within some nest of scopes related to where they are
declared (e.g. up to the innermost named (class or resource) scope).
Given, then, that that form of relative name lookup will be supported, I
think generalizing that to classes and resources as well (second
alternative) bears serious consideration.
There is also the ability to reference a class and access its attributes
via the Class type. This way, it is totally clear what the resolution
is, and what the names are relative to. e.g.
$b = Class[some::class::somewhere]
$b[x]
and if this is done in a class, and you don't want the $b to be visible
private $b = Class[...]
This way there is no guessing what a relative name may mean. (In essence
relative names are only (optionally) used when defining classes and user
defined resource types.
On the other hand, those who have commented in the past seem to agree
that Puppet's historic behavior of traversing the full chain of nested
scopes, trying to resolve relative names with respect to each, is more
surprising than useful. I'm on board with that; I'm just suggesting
that there may be both room and use for a more limited form of relative
naming.
I am struggling with the balance of being useful, not having to type too
much, and ease of refactoring with sanity and performance... this
discussion is very valuable.
I like the simplicity of "an unquailified variable = what I see here",
and "a qualified variable = an absolute reference".
Local scopes are always local, there is no way to address
the local variables from some other non-nested scope - essentially how
the regular CPU stack works, or how variables in a language like C
work).
i.e. we have something like this in Scope
Scope
attr_reader :global_scope
attr_reader :parent_scope
# ...
end
The global scope keeps an index designed to be as fast as possible to
resolve a qualified name to a value. The design of this index
depends on
the frequency of different types of lookup. If all qualified lookups
are
absolute it would simply be a hash of all absolute names to values (it
really cannot be faster than that).
The logic for lookup then becomes:
- for un-qualified name, search up the parent chain (this chain does
not
reach the global scope), if still unresolved, look in global scope.
From the description alone, I'm not sure how it can be asserted that
the chain of local scopes does not reach global scope, unless by the the
trivial fact that the global scope is not itself a local scope. What I
would hope to see, and perhaps what is meant, is that the lookup stops
at local scopes that correspond to classes and resources. In
particular, I think it is essential that unqualified class name lookups
not be resolved against parent namespaces.
Nested ("local") scopes only contains unqualified names, and an inner
scope shadows an outer scope (there are a few additional rules for
restricted names such as $trusted, and $facts which may not be shadowed
in any scope). Qualified names (for variables) can only be created in
classes and these are only the public attributes of those classes. No
local (shadowing) scope places this "global scope" as an outer scope.
$x = 10
class a {
$x = 20
$y = $x
$z = $::x
}
Here the variables $a::x == 20, $a::y == 20, and $z == 10
The $::x is not found in an outer scope of the scope used to evaluate
the logic inside of class a.
The local scopes dies when evaluation using that scope - eh. goes out of
scope. The persisted values are kept in the global-scope index (and in
the instantiated classes and created resources).
That is, in class ::m::a::b, "include 'foo'" must not refer to
::m::a::foo, and certainly not to ::m::foo, but I'd be ok if it could
refer to ::m::a::b::foo. As a special (but important) case, in
::m::a::b, "include 'b'" must not refer to ::m::a::b itself, and
"include 'a'" should not refer to ::m::a.
I think (but is not 100% sure) that it would be best to have to qualify
the name - i.e.
include foo # is include ::foo
include x::y # is include ::x::y
Other languages have solved the same issue in different ways:
* Ruby is obviously very flexible in how it searches (it also makes it
slow), and sometimes (just like in Puppet) it is mysterious why it
works or not in some cases.
* Java uses an import to import a name which can then be used in short
form, nested classes can be relatively referenced.
* Some Java like (new) languages use an import/alias mechanism
If we go down that path, these name imports would appear at the start of
the file and apply to the content of that file - i.e. it is a help to
the *parser* to construct the correct code (there is no searching at
runtime). Now sadly, import is a function that is just deprecated in the
Puppet Programming Language and reintroducing it with a different
meaning would just be a cruel joke... if instead we want to be able to
alias names maybe we could use "alias"
alias apache = mystuff::better_apache::apache
To support an alias like that, the only reasonable thing a parser could
do is to replace every "apache" in every qualified name with the alias -
i.e. apache::foo becomes mystuff::better_apache::apache::foo
A powerful mechanism to reduce typing - but that are also tricky if we
support more than a first pass of alias replacements, multi segement
aliases etc. (A sane impl. could perhaps only perform the replacement of
the first segment, and that an alias cannot be qualified itself.
(An alias could also be set to ::)
I am not sure I want to see aliases like these in the language.
Sometimes a bit more typing is good for (esp. the future) you.
We have a problem with referencing a class directly with a variable
since we can do this
class a {
$b = { x = jing }
class b {
$x = 10
}
notice $b::x
notice $b
}
$b is not a reference to the class, but in $b::x it is (this is kind of
confusing).
Super
Yet another way of handling resolutions is to add a super (reserved)
namespace word, that resolves the superclass. It would function as an
(absolute) reference to the superclass and mean give me a variable as
the superclass sees it (given class is allowed to see it). e.g.
class c inherits b {
$z = $super::x
}
But I am not sure that throwing yet another object oriented term into
the non object oriented puppet casserole makes it any sweeter...
I'm going to try to digest some more of this over the weekend. Perhaps
I'll have more to say on Monday.
I can imagine having a hangout on this topic as well...
Such as about scoping function names
so that different environments can bind different implementations to the
same name, maybe.
There will be support for scoping function names. i.e. you can call
mymodule::foo(x)
All such references are currently (albeit still at the idea state)
absolute names - no shadowing, and no "local functions" are planned.
Aliasing is being contemplated, which means it is possible to alias
certain functions.
alias foo = mymodule::foo
Which would make all calls to foo() go to mymodule::foo() in the
.pp file having that alias at the top.
--
You received this message because you are subscribed to the Google Groups "Puppet
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to puppet-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/puppet-dev/lg0ii1%24uo0%241%40ger.gmane.org.
For more options, visit https://groups.google.com/d/optout.