On 29 May 2014, at 08:46, Mark van Gulik <[email protected]> wrote:

> Eventually we may rework whitespace rules to support forbidden, mandatory, 
> optional, and significant whitespace between tokens.  The last one is to 
> support whitespace-sensitive syntax like Python.  That functionality would 
> show up as changes to the interpretation of method names (see 
> MessageSplitter.java), as well as analysis of the whitespace that we started 
> capturing inside the adjacent tokens several weeks ago.  We were thinking of 
> a space indicating required whitespace, a space followed by the double 
> question mark character as being optional whitespace (except between 
> alphanumeric tokens), and no space indicating whitespace is disallowed.  
> Alternatively we may allow specific whitespace characters like tab (\t) to be 
> back-ticked (`) to indicate precisely which whitespace character is required 
> or optional (with double question mark).  Maybe a back-ticked new line (`\n) 
> would indicate a forced line break plus maintaining the current indent level. 
>  And a back-ticked new line followed by one or more back-ticked tabs would 
> indicate a Python-like indentation increase by the specified tab count.

ok, so whitespace will become a first class citizen.

> But getting back to identity... a raw function is defined with fiat identity. 
>  Its o_Equals() and o_Hash() operations cause two raw functions to be 
> considered equal only if they are represented by the same (Java ==, after 
> following indirections) AvailObject. Generally speaking, a raw function is 
> lexical:  it corresponds to a particular sequence of characters at a 
> particular position within a particular source file.  So two raw functions 
> that were created from different source files or different regions of the 
> same source file would be treated as distinct.

what if there was a special raw *immutable* function type without identity, 
thus considered a pure value?

> A function is just a combination of a raw function and its captured outer 
> variables.  Its equality test compares these corresponding parts, and its 
> hash value is derived from these parts.  But since the raw function has 
> identity, the function sort of does, too.  Two functions created from the 
> same piece of code, like the "[1+1]" in the expression "map 1 to 10 through 
> [x : integer | [1+1]]" are equal if their captured variables (outers) are 
> equal.  In this case there are no outer variables, so the ten functions are 
> equal.  But in "map 1 to 10 through [x : integer | [x]]", the resulting tuple 
> of functions are expected to produce different values when evaluated (e.g., 
> the 3rd one is functionally equivalent to the function [3]).  They each 
> capture the x argument from the enclosing function, so this time the tuple 
> contains ten distinct functions.  This is probably about as useful a 
> mechanism as we can provide without sacrificing a lot of optimization 
> opportunities.

i’m fine with all this: that the lexical scope and position are taken into 
account, etc.

but this is my use case:

Public method "test hash" is 
[
 [ x : integer, y : integer | x - y ]'s hash
];

calling “test hash” 10 times gives me the same hash a.
now if i unload and recompile the module that contains “test hash”, it gives me 
a different hash b (i.e. it is not consistent).

all this it is not a big deal: if i want consistent hashes, i can probably 
calculate one myself, using avail’s reflection capabilities.

> And finally... I published a fix for #2 -- better integer hash distribution.  
> Now you should be able to safely remove your _'s⁇rehash" method.

thank you for that - much appreciated!

tonight i will update my treap implementation (v0.2) to also include 
(theoretical) optimal set operations such as union, difference, intersection 
etc.
after that, i will implement a ‘compress’ operation that compresses a treap 
into bins of roughly 4 to 16 elements at the bottom of a treap, in order to 
reduce memory footprint.

cheers,
Robbert.

Reply via email to