[Scheme-reports] Just a load of.. well..

Biep Thu, 22 Mar 2012 12:02:42 -0700

Dear all,

This will be a somewhat weird mail.  As I suffer from chronic fatigue, near the 
end of the R6RS process I never got around to turning my notes of Scheme ideas 
into postable material.  Then R7RS started to take shape, and many of my ideas 
were no longer a propos - but as the activity moved to scheme-reports, which - 
I suppose - allows discussion of ANY report, some of them became so again, 
because there may be more standards after R7RS.  As a New Year's resolution I 
decided to post them.
Currently my energy level has gone up sufficiently to do so, so here they are, 
as they are, with warts, errors, inconsistencies, incomprehensibilities, et 
cetera, and all.


May Scheme flourish and remain the ideal for other languages to strive for!
J. A. "Biep" Durieux.

= = = = = = = = = = =

Preliminaries.

*  Stuff I wrote about during the R6RS-discussion that I still consider 
important (in the R6RS archives - Anton posted them for me)
  - A philosophical basis for evaluating Scheme versions (grep for 'WdW')
  - How to cut up the language (grep for 'quadripartition')

* Naming the Scheme languages.
  I want unique names, to that I can e.g. grep the number of occurrences of 
"Scheme" as opposed to "Common Lisp" or "ML" in, say, CiteSeer.  Currently 
searching for "Scheme" yields a large number of false positives, whereas 
"Scheme language" yields an undercount.
  I also want to keep the name "Scheme", though, and I like short, catchy names 
- qualifiers can always be added later on.
  Therefore I propose "Schemer" for small Scheme, and "Schemest" for large 
Scheme.  The one name clearly indicates its roots, and the other the fact that 
it is the maximal Scheme.  The name "Scheme" would remain as the family name, 
and the report could be subtitled "Scheme: Schemer, Schemest".
  (And I think "Schemest" would then be a one-syllable word.)

Syntax

* Case sensitivity.
  There must be a default behaviour.  I personally like that to be 
case-insensitive (a micro Scheme might not even KNOW about, say, lower-case 
letters), but *requiring* some pragma at the beginning of each program is no 
sensible option, I think.  Libraries are of course independent of user 
preferences - which might be set in some .ini file.  Libraries ought not to 
expect any behaviour in this respect, though - exporting both this and This is 
discouraged.
  If it happens (Xlib's XK_A and XK_a), they may come out as XK_A and |XK_a|.
  But that is up to the install! macro, and thus to the user of the library, 
and not to the Scheme used.

* The asymmmetry between multiple inputs and multiple outputs for procedures.
  This is inherent in Scheme syntax: the receiver of the output of an 
expression is indicated by the place where this expression occurs - which is 
necessarily singular: (get-some g1 (provide p1 p2 p3) g2).  Repeating the 
expression would lead to repeated execution: (get-some g1 (provide p1 p2 p3) 
(provide p1 p2 p3) g2) - apart from being cumbersome.
  Prolog has a symmetry in its syntax here that languages with nested 
expressions simply don't (and I think can't) have.  Mark that this is 
independent of unification: it would be possible to have an unidirectional 
prolog-like notation, with a group of input variables and a group of output 
variables, say separated by a hyphen:
  (lambda (a b - result rem quot) (quotient+remainder a b - quot rem) (* quot 
rem - result) (display result -))
  Barely possible is "splicing in" multiple results - which would require them 
to be adjacent.  A splicing construct (+ 1 ,@(quotient+remainder 25 4) 6) might 
do - see Lua for an even more constricting solution.

* Symbols.
  Maybe symbols should be decoupled from code text - a Chinese might use other 
names for the identifiers in a Scheme program (alpha renaming) than a German 
would - without it being another program.  There is simply an abstraction level 
difference.  Both could collaborate on a common program without ever even 
knowing what names the other uses.
  First-class environments know the variables, but may not necessarily know 
their names (maybe just some index number) - just as the German may not know 
the Chinese names, or even the characters they are written in.
  This also means the English bias will be gone (names such as "if" or "set!") 
in other 'locales' these symbols can have other text representations.
  Many spreadsheets do this already: function names come out differently in 
other locales, but code written in one locale works corrextly in others.

* Syntax decoupling.
  As with symbols in the preceding proposal, syntax could be decoupled.  
S-expressions exist on the base level and may be used, but other 
representations are possible.  We need to hack input reading anyway if we want 
to be able to add new types with their idiosyncratic representations, so why 
not go the whole way?  In an appropriate context, Dylan, Lua and Algol 60 are 
simply different syntactic manifestations of Scheme.  This would help in the 
acceptance of Scheme into the mainstream as well.  Let s-expressions be the raw 
format, but allow laymen to use jpeg and the like - they'll come to raw when 
they need the full power.

Uniformity

* Tables and environments.
  This would lead to mutable first-class environments.
  Let identifiers be hygienic - translated to locations (or location 
indicators, as the case may be) at the moment a procedure is defined.  So (+ x 
1) doesn't change when another x is inserted in a more local environment - or 
when it is deleted from the original environment.  In the latter case the 
location is not garbage-collected, because this code still refers to it.  It is 
also possible to write (+ (get-location-of 'a) 1), and *that* code is sensitive 
to environment manipulation.
  (There might well be a flag indicating that each mention of an identifier is 
meant as a call to get-location - some kind of "don't inline" pragma.)
  Oh, and environments shouldn't be different from other tables, and should 
have a way to indicate where to find a location that is not locally available.  
In that way various schemes of multiple inheritance would become trivially 
implementable.

* External resources.
  A common interface should govern all external resources (libraries, files, 
sockets, ..).  I don't necessarily want to know whether I read a file or 
interact with a DLL or a user - and certainly don't want to write various 
copies of my code to do these different things.  That would be like writing 
separate sorting routines for strings, numbers, etc.

* Uniformity.
  External things should look like internal things in order to have them share 
all the advantages of the felicitous choice of Scheme internal things (WdW).  
That's why to me e.g. a library should look like a procedure, from which the 
library elements can be obtained:
  (define math (get-library '(my-math))) ; simplified exempli gratia
  (math 'exports) -> '(sin "cos" "Cos" 3.14) ; export tags are not necessarily 
tokens (but would normally be)
  (define cos (math 'get "cos"))
  Obviously, a macro might inspect the exports and import them, with or without 
a prefix, in a given environment
  (install! '(math) 'math) ; does get-library, collects the exports and defines 
them.
  math::cos -> '#<procedure math::cos>
  This way, the user has full control over the way a library changes the 
programming environment.
  And this macro would be in a library, with a nice bootstrap line in people's 
.ini files.

* Tables.
  I see no reason why a reified environment or (hash) table should look any 
different from a library procedure as described here.

Simplicity

* Multi-pass compilation.
  This is like solving a consistency maintenance system (Jon Doyle called that 
"truth maintenance", some others "reason maintenance"), which is more for 
constraint satisfaction systems than for Scheme.  Single pass with maximal 
deferment is the Scheme way: variables inside lambdas don't need to be resolved 
yet, resolution can be deferred until they are called.  Without very strong 
reasons, one should not request different behaviour from syntax transformers.  
Ease of compilation, or speed of the resulting system are no strong reasons.

* Sublanguages.
  Once there was Lisp/Scheme, which was about lambda.  Then a completely 
different language for macro rewriting was added, and now a (hidden) constraint 
satisfaction system.  I don't like that, it is feature on top of feature.  A 
rewriter can be written in Scheme and then used in macro expansion - great, but 
let the rewriter then be generally available (because it has other possible 
uses), and be optional in writing syntax (because there are other ways that 
sometimes may be better).  The same is true for a consistency maintanance or 
constraint satisfaction language - let them be first-class, explicit languages 
that one may use or not according to ones liking.  That's what libraries are 
for.

* Minimality.
  Something else is lost here.  Scheme was about the right thing, but also 
about the minimal thing.  Scheme provided a basis which was like a semantic 
assembly language: it provided primitives with which to build.  Yes, certain 
complex systems ease programming - so let's make sure such systems can be 
written in Scheme, but let's not make them the basis of Scheme!  It sounds like 
old-school anthropology: let's give primitives the right to remain primitive - 
while making sure they are NOBLE primitives.

Mutation

* Redefining read.
  This will not change the meaning of code (or anything else) read before, even 
through explicit calls of the read procedure.  (In some multi-pass models this 
would be perfectly appropriate behaviour, though.)

* Principle of least surprise.
  Reassigning a variable changes the result of each use of that variable.
  Inlining procedure calls is possible exactly when it can be proven that the 
value of the procedure won't change (e.g. local procedures, direct lambda 
expressions) - anything else would turn Scheme into a Lisp-2 by giving special 
privileges to procedure values over other values.  I want my program to be able 
to redefine procedure variables at run-time, e.g. a program that learns by 
repeatedly wrapping functions, thus making them more knowledgeable.
  Opaque code may remain unaffected under procedure variable reassignment - and 
code may be declared opaque.  That basically boils down to the good old (let 
((car car) (cdr cdr) ...) ...).  Standard procedures are all opaque.  An 
opacity declaration may contain exceptions: variables to the reassignment of 
which the code IS sensitive.
  (opaque (except my-proc memq) code ...) ; - which is like a kind of export 
clause.

* Locations.
  Having explicit mutable locations (say, ML style), would allow efficient 
compilation where they aren'n used.  And explicit mutable locations can be used 
as a base level on which to build Scheme-as-we-know-it, with lambda binding its 
variables to mutable locations, so that backward compatibility is maintained.  
It would help both with parallelism (fewer locks needed) and with inlining.
  "But locations can escape scope, whereas variables can't."  Well (lambda (x) 
(set! var x)) and (lambda () var) can (and do often) escape, and are just a 
work-around for locations:
  (define set-var! #f)
  (define get-var #f)
  (let ((var #f))
    (set! set-var! (lambda (x) (set! var x)))
    (set! get-var  (lambda () var)) )
  And other locations, such as the car and cdr of a pair, already excape, so 
why have an artificial difference between those and variable locations?

* Parallelism.
  Full parallelism is problematic, so some kind of locking will be necessary.  
I think locations are the natural place for that: this function wants to update 
a location, so no-one else should do an update in parallel.  Just reading or 
writing is no problem, but writing a value that is computed from a read is.
  Let a sequence be a series of expressions that must be executed sequentially 
(through begin or let*, or because function body execution comes after argument 
evaluation, or whatever).  Obviously, sequences can be nested.
  Let a location be any mutable pointer storage.
  In beginner mode, any sequence that accesses a certain location locks that 
location till the end of the sequence.  In special cases, a compiler may prove 
that a lock is not or no longer necessary.  It is often easy to prove this for 
primitives such as car, cdr, so that their use doesn't preclude parallelism.
  In advanced mode, locks are set by the programmer through a (lock (<location> 
...) code ...) construct.
  Obviously, a lock doesn't prohibit usage by a subsequence (possibly inside a 
parallel unit), but when that happens other subsequences in the same parallel 
block are locked out.  So, let Si and Pj stand for sequential and parallel 
units: (S1 <lock var1> (P1 (S2 <use var1>) (S3 <use var1>) S4) <unlock var1>) 
sequentialises S2 and S3, but S4 can still run parallel with this sequence of 
S2 and S3.

Efficiency

* Its place.
  Implementability is an issue for Scheme language design, but ease of 
implementation, or efficiency of implementation ought not to be.  (I realise 
there are fringe issues, such as a theoretical lower limit of 2^2^n in space or 
time..)  Compilability is nice, but never a basis for design decisions.

* Tail call efficiency.
  This should include load: the value of load would then be the value of the 
last expression in the file loaded.  If one wants anything else, it is trivial 
to wrap something around an efficient load, whereas the inverse is not 
possible. 

Other thoughts

* Write/read invariance.
  IO is library material, but the library provided ought to allow write/read 
invariance on closures and environments.  A unique tag provides unification if 
the object is already present; if not, it is produced.  In that way two running 
Schemes can exchange procedures, including ones capturing variables.  The 
unification scheme for environments allows one to reconstruct environment 
inheritance incrementally.

* Macros.
  The principle of least surprise is already violated with macros.  Macros must 
be bound before use; there is no equivalent to anonymous lambdas.  Imagine ((if 
freevar <procedure def> <macro def>) arg ...) - this cannot be in Scheme, 
because Scheme wants to evaluate the arguments together with the head, whereas 
the head stipulates whether the arguments need to be evaluated in the first 
place.  I think this evaluation rule reflects a thought error: a seeming 
symmetry was taken for a real one.  Anonymous macros would be a boon, as would 
be runtime evaluation of expression heads, even if they are or may evaluate to 
macros.  Right now a real symmetry (between procedures and macros) is lost for 
the sake of a false one.
  This is about removing restrictions in the Clinger-sense, I think - having 
first-class syntactic transformers.  Currently they have a status comparable to 
CL's functions - without the possibility of anonymous transformers or SYNCALL.
  There is also no syntactic equivalent of "direct code" as opposed to code 
stored in a lambda form.  Direct syntactic code would of course be immune to 
later redifinitions of transformers - as the transformation has taken place at 
the moment the single evaluation pass dealt with that bit of code.

* Ports.
  Flushing an output port is the symmetric opposite of peeking at an input port 
- or it ought to be.  I think lazy streams could make ports more Schemey - I 
don't like the built-in necessity to deal imperatively with ports.  But I am 
not clear on all this - I only feel Scheme could have a clean mathematical 
notion of ports which is unlike what other languages have.

* The REPL.
  In thinking about REPL behaviour, the Lua notion of a chunk is useful.  
Balancing parens may delimit a chunk (as with begin), but the REPL might 
provide other ways, say Shift+Enter to stay within the chunk.

* Time.
  Time is an elusive concept.  It is generally measured by comparison to cyclic 
systems, either individual ones (the earth turning around its axis) or generic 
ones (a cesium atom vibrating).  The two are different, in that the individual 
one may have an explicit zero point, relative to a reference point (the 
Greenwich meridian pointing away from the sun, or pointing to Betelgeuze, or to 
the moon will all define individual days of slightly differing length), and 
many of those are in use (a year may be the earth being back in its position to 
the sun relative to Betelgeuze or to its axis of rotation, or a number of moon 
cycles, or a number of days, or a combination of those).  Many systems also use 
more than one individual cyclic system, using some leap system to keep them 
more or less in sync.
  Scheme is simply to recognise this, and see a time period or point as a 
formula of base measurements: so many cycles of the earth around the sun 
relative to its axis of rotation plus so many moon cycles measured thus plus so 
many days measured thus minus so many seconds defined yet another way.
  Primitive measures could be declared as time units, with or without a zero 
point, preferably but not necessarily giving an approximation formula in terms 
of already-declared measures ("a year-of-type-x tends to be so many seconds; 
the length of an Ethiopian year in days can be approximated by this formula; 
the interest due date is the last working day of the month or the transaction 
day, whichever comes first."), and usin those conversion formulas Scheme could 
then give a normally inexact conversion of one time span or point into another. 
 Obviously some conversions, such as a week into days, could be exact.  Time 
point ordering would be based on these inexact measures.  A Scheme that expands 
its notion of inexactness to include, say, a probability distribution and 
returns a probability for predicates can do so here.

* Strings
  Wouldn't a Schemish string be a sequence of character objects (soco), where a 
character object can be complex (base + modifiers)?  Just as s-expressions 
respect the grammatical complexity of expressions, a soco would respect the 
grammatical complexity of text.  If the sequence is a vector, string indexing 
and string-set! would be O(1).  Yes, socos take more space than classical 
strings, but the same is true for s-expressions as opposed to flat program code.
  For Schemes with only ASCII, the set of character objects would be the ASCII 
set, and the pointers could be 8-bit, i.e. the strings could be the classical 
string that we all know and love/hate.  There's backwards compatibility.
  Those who want size-changing substitutions can use some kind of tree 
representation of the sequence instead of vectors (lists being an extreme kind 
of unbalanced trees).
  The interesting thing would be that people can choose their sequencing level: 
if you want to reason about code points, use sequences of code points; if you 
want to reason on the syllable level, make sequences of syllables.  It is 
possible to have several levels: a sequence of words, each of which is a 
sequence of characters, each of which is a sequence of code points.
  In other words: Scheme should not prescribe one crippling string format, but 
rather a set of specifiers which which people can define the strings they need. 
 A classical ASCII-like one is obligatory, the others are optional.


_______________________________________________
Scheme-reports mailing list
[email protected]
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports

[Scheme-reports] Just a load of.. well..

Reply via email to