Todd and I read through your treap implementation today. Very nice, indeed. We did notice that there's a minor issue on line 356 (and similarly 357):
arg1Type ::= ordMetaType's parameters'type[1]; Although it might not be apparent from the method name itself, you should probably add a space between the apostrophe and "type[1]". Breaking down the name "_'s⁇type", we expect an argument, an apostrophe, an optional "s" token, and a "type" token. Whitespace is currently always allowed (but not required) around any operator character. Each operator character is parsed as a separate one-character token, which vastly simplifies parsing compared to C++ or Smalltalk (much standardization effort was spent defining how 5--3 should be parsed). A strange thing you might have noticed is that in Avail, expressions like "x:=5", "x := 5" and "x: =5" are all parsed as an assignment (which will soon be just a use of the "_:=_" macro). Eventually we may rework whitespace rules to support forbidden, mandatory, optional, and significant whitespace between tokens. The last one is to support whitespace-sensitive syntax like Python. That functionality would show up as changes to the interpretation of method names (see MessageSplitter.java), as well as analysis of the whitespace that we started capturing inside the adjacent tokens several weeks ago. We were thinking of a space indicating required whitespace, a space followed by the double question mark character as being optional whitespace (except between alphanumeric tokens), and no space indicating whitespace is disallowed. Alternatively we may allow specific whitespace characters like tab (\t) to be back-ticked (`) to indicate precisely which whitespace character is required or optional (with double question mark). Maybe a back-ticked new line (`\n) would indicate a forced line break plus maintaining the current indent level. And a back-ticked new line followed by one or more back-ticked tabs would indicate a Python-like indentation increase by the specified tab count. I'm not sure what your issue #3 is about exactly, but I suspect it has to do with functions being composed of raw functions (CompiledCodeDescriptor, also known as function implementations) and outer variables. The raw functions have identity because they have to track (in a secret atom stored in CompiledCodeDescriptor.ObjectSlots#PROPERTY_ATOM) some properties like the defining module, starting line number, and a suitable name for a function based on this raw function. And we'll add more properties like annotated parse trees and hyperlinks as we extend our tool support. The name property gets set when a function is used as a method body, semantic restriction body, macro body, or anything similar. When the raw function is given its name (e.g., see P_253_SimpleMethodDeclaration, line #101), any raw functions lexically nested in it are given suitable names derived from the parent function's name ([#1] of [#3] of "_+_"). Those names are what make Avail stack traces useful, so equating and coalescing functionally equivalent raw functions would cause a big usability problem. But getting back to identity... a raw function is defined with fiat identity. Its o_Equals() and o_Hash() operations cause two raw functions to be considered equal only if they are represented by the same (Java ==, after following indirections) AvailObject. Generally speaking, a raw function is lexical: it corresponds to a particular sequence of characters at a particular position within a particular source file. So two raw functions that were created from different source files or different regions of the same source file would be treated as distinct. A function is just a combination of a raw function and its captured outer variables. Its equality test compares these corresponding parts, and its hash value is derived from these parts. But since the raw function has identity, the function sort of does, too. Two functions created from the same piece of code, like the "[1+1]" in the expression "map 1 to 10 through [x : integer | [1+1]]" are equal if their captured variables (outers) are equal. In this case there are no outer variables, so the ten functions are equal. But in "map 1 to 10 through [x : integer | [x]]", the resulting tuple of functions are expected to produce different values when evaluated (e.g., the 3rd one is functionally equivalent to the function [3]). They each capture the x argument from the enclosing function, so this time the tuple contains ten distinct functions. This is probably about as useful a mechanism as we can provide without sacrificing a lot of optimization opportunities. A further detail makes this even more complicated. The level one nybblecode instruction L1_doPushLastOuter pushes the specified outer variable of the current continuation's function. If, however, that function is still mutable, it is free to clobber that outer variable slot (with the Avail integer 0), rather than force the variable (or the actual value if it's a function argument like x, above) to become immutable (due to both the newly pushed stack slot and the function holding references to the same value or variable). But, assuming the function actually was mutable, after we've zeroed that outer variable slot of the function, its hash calculation will be different from what it would have been before the L1_doPushLastOuter instruction ran. And it will no longer be considered equal to a function for which the raw function and all the rest of the outer variable slots are intact, but the outer variable slot corresponding with the clobbered slot still holds its previous value or variable. This is a tricky situation for the VM. On the one hand, we want functions to behave like completely immutable objects to keep the execution model clean, acting as identitilessly as possible. On the other hand, we want to continue to keep values mutable as long as we can, including functions. I think the current mechanism is a pretty good compromise, but in theory you might run into bizarre problems whereby you could compute the hash value of a (mutable) function at different times and get different answers. Technically, this isn't currently a possibility, since whenever you hold one of these mutable functions in a variable in order to extract its hash or compare it against another function, it will already have been made immutable. The one other concern I have is that the list of outer variables that are still "live" (not zeroed) depends on the time at which you make the function immutable. I believe it's currently impossible to observe distinctions between functions that are based on the same raw function but with some slots zeroed in one function but not another. That's because there doesn't seem to be a mechanism to access the current continuation's function without also making it immutable at the same time, and there can't be two code positions that capture the function at different points in its execution. Either the paths capture the function at some particular instruction or they don't. If a particular instruction captures it, the function is made immutable, suppressing any other outer variable zeroing for that function. If the instruction doesn't capture it, the outer variable slot of the function won't be affected – but we unconditionally won't be capturing the function there, so we can't compare the functions created by the two situations: they're statically mutually exclusive by construction. So long as we're mindful of L1_doPushLastOuter's issues when building tools (and we have to be aware of other mutability-preserving L1 instructions anyhow), we shouldn't mind too much that some outer variable of the function may have been clobbered with a 0 by the time we actually look at it with a debugger or whatever. If we eliminate the mutability-preserving L1 instructions we'll end up with an extremely clean model... but the L2 optimizer will have to satisfy that clean model in its off-ramps by making a lot more things immutable. And for the L2 off-ramps (places corresponding to inter-L1-nybblecode gaps) to be able to accurately generate (on demand) the values that will appear in reified L1 continuation slots, L2 will have to preserve more information and recycle fewer mutable objects. So a perfectly clean L1 prevents a lot of (proposed) optimizations in L2 which would simply wipe values that the existing L1 semantics say can be safely discarded by the execution machinery. I'm not perfectly happy with this model, but it's the best compromise that we've attempted. And finally... I published a fix for #2 -- better integer hash distribution. Now you should be able to safely remove your _'s⁇rehash" method. On May 28, 2014, at 2:11 PM, Robbert van Dalen wrote: > hi, > > i’ve done a treap implementation (v0.1) in avail. > you can find the stuff here: > > https://github.com/rapido/spread/blob/master/avail/Treap.avail > > three things i’ve noticed: > > 1) the performance is not so good > 2) integer hashes are not very well distributed > 3) the hash of an immutable function is not deterministic > 4) lurking avail’s (public) mailing list is not possible anymore > > 1) will be solved within the couple of years - but i do want to hear from you > what are currently the performance killers. > 2) is easy to fix > 3) is more fundamental with no clear answers > 4) ? > > other than that, i’m greatly surprised by avail’s powerful type system - in a > very very positive way! > also, the compiler’s error messages were very helpful in steering me in the > right direction. > > man, avail is completely changing my thoughts about how computer languages > should work. > avail has been the subject of my day dreams for the past 4 weeks. > > further steps: add a blog entry! > > thanks and cheers, > Robbert.
signature.asc
Description: Message signed with OpenPGP using GPGMail
