On 29 May 2014, at 08:46, Mark van Gulik <[email protected]> wrote:
> Eventually we may rework whitespace rules to support forbidden, mandatory, > optional, and significant whitespace between tokens. The last one is to > support whitespace-sensitive syntax like Python. That functionality would > show up as changes to the interpretation of method names (see > MessageSplitter.java), as well as analysis of the whitespace that we started > capturing inside the adjacent tokens several weeks ago. We were thinking of > a space indicating required whitespace, a space followed by the double > question mark character as being optional whitespace (except between > alphanumeric tokens), and no space indicating whitespace is disallowed. > Alternatively we may allow specific whitespace characters like tab (\t) to be > back-ticked (`) to indicate precisely which whitespace character is required > or optional (with double question mark). Maybe a back-ticked new line (`\n) > would indicate a forced line break plus maintaining the current indent level. > And a back-ticked new line followed by one or more back-ticked tabs would > indicate a Python-like indentation increase by the specified tab count. ok, so whitespace will become a first class citizen. > But getting back to identity... a raw function is defined with fiat identity. > Its o_Equals() and o_Hash() operations cause two raw functions to be > considered equal only if they are represented by the same (Java ==, after > following indirections) AvailObject. Generally speaking, a raw function is > lexical: it corresponds to a particular sequence of characters at a > particular position within a particular source file. So two raw functions > that were created from different source files or different regions of the > same source file would be treated as distinct. what if there was a special raw *immutable* function type without identity, thus considered a pure value? > A function is just a combination of a raw function and its captured outer > variables. Its equality test compares these corresponding parts, and its > hash value is derived from these parts. But since the raw function has > identity, the function sort of does, too. Two functions created from the > same piece of code, like the "[1+1]" in the expression "map 1 to 10 through > [x : integer | [1+1]]" are equal if their captured variables (outers) are > equal. In this case there are no outer variables, so the ten functions are > equal. But in "map 1 to 10 through [x : integer | [x]]", the resulting tuple > of functions are expected to produce different values when evaluated (e.g., > the 3rd one is functionally equivalent to the function [3]). They each > capture the x argument from the enclosing function, so this time the tuple > contains ten distinct functions. This is probably about as useful a > mechanism as we can provide without sacrificing a lot of optimization > opportunities. i’m fine with all this: that the lexical scope and position are taken into account, etc. but this is my use case: Public method "test hash" is [ [ x : integer, y : integer | x - y ]'s hash ]; calling “test hash” 10 times gives me the same hash a. now if i unload and recompile the module that contains “test hash”, it gives me a different hash b (i.e. it is not consistent). all this it is not a big deal: if i want consistent hashes, i can probably calculate one myself, using avail’s reflection capabilities. > And finally... I published a fix for #2 -- better integer hash distribution. > Now you should be able to safely remove your _'s⁇rehash" method. thank you for that - much appreciated! tonight i will update my treap implementation (v0.2) to also include (theoretical) optimal set operations such as union, difference, intersection etc. after that, i will implement a ‘compress’ operation that compresses a treap into bins of roughly 4 to 16 elements at the bottom of a treap, in order to reduce memory footprint. cheers, Robbert.
