[Readable-discuss] sweetener demo (also demos sweet-expressions)

2012-07-21 Thread David A. Wheeler
FYI: The reformatter, which I plan to soon rename to "sweetener", is working 
much better now.  Its output also serves as an interesting demo of 
sweet-expressions.  Like any reformatter, it won't necessarily choose the 
representation a human would choose, but it's often reasonable.

I've also been using it to test round-trips (sweeten|unsweeten|prettyprint 
should produce the same thing as prettyprint if given S-expressions), which 
helps wring out bugs in both tools.

Below is an example of a program in S-expression format, followed by the result 
of the "sweetener" converting it to sweet-expressions.  The program I chose to 
reformat is the sweetener itself :-).

--- David A. Wheeler



= sweetener-as-s-expressions.scm ===


; Filter to read S-expressions and output indented sweet-expression.
;
; Copyright (C) 2006-2012 David A. Wheeler.
;
; This software is released as open source software under the "MIT" license:
;
; Permission is hereby granted, free of charge, to any person obtaining a
; copy of this software and associated documentation files (the "Software"),
; to deal in the Software without restriction, including without limitation
; the rights to use, copy, modify, merge, publish, distribute, sublicense,
; and/or sell copies of the Software, and to permit persons to whom the
; Software is furnished to do so, subject to the following conditions:
;
; The above copyright notice and this permission notice shall be included
; in all copies or substantial portions of the Software.
;
; THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
; IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
; FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
; THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
; OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
; ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
; OTHER DEALINGS IN THE SOFTWARE.

; TODO: Handle arbitrary end-of-line. Currently this assumes that
;   lines end with just #\newline.

; Note: The maxwidth may be violated if, at a current indent, there is a
; long non-pair that exceeds it.  But other than long atoms, it's respected,
; so it's unlikely to exceed this width in practice:
(define maxwidth 78)

(define indent-increment
  '(#\space #\space))
(define max-unit-character-length 60)
(define max-unit-list-length 8)

(define group-string "
")
(define infix-operators
  '(and or
xor
+
-
*
/
^
++
--
**
//
^^
<
<=
>
>=
=
<>
!=
==))

; Lists with these symbols as first parameter, and aren't shown as 1 line,
; are be shown as a line with SYMBOL FIRST-PARAMETER and *then* indents.
; This is used when in typical uses the first parameter is *special* and
; has a different semantic meaning from later parameters.
; This refinement isn't *necessary* but I think it looks better.
(define cuddle-first-parameter
  '(define lambda
 if
 when
 unless
 case
 set!
 let
 let*
 letrec
 let1
 do
 define-module
 library
 export
 import
 defun
 block
 typecase
 let-syntax
 letcrec-syntax
 define-syntax
 syntax-rules))


(define tab (integer->char 9))
(define LISTLP (list #\())
(define LISTRP (list #\)))
(define LISTLBRACE (list (integer->char 123)))
(define LISTRBRACE (list (integer->char 125)))


; Return length of x, which may be an improper list.
; If improper, count the two sides as two, so "(a . b)" is length 2.
(define (general-length x)
  (general-length-inner x 0))

(define (general-length-inner x count-so-far)
  (cond ((null? x) count-so-far)
((not (pair? x)) (+ count-so-far 1))
(#t
 (general-length-inner (cdr x) (+ count-so-far 1)

; Return list x's *contents* represented as a list of characters.
; Each one must use modern-expressions, space-separated;
; it will be surrounded by (...) so no indentation processing is relevant.
(define (unit-list x)
  (cond ((null? x) (quote ()))
((pair? x)
 (if (null? (cdr x))
   (unit (car x))
   (append
 (unit (car x))
 '(#\space)
 (unit-list (cdr x)
(#t
 (append (quote (#\space #\. #\space)) (unit x)


; Return #t if x should be represented using curly-infix notation {...}.
(define (represent-as-infix? x)
  (and (pair? x)
   (symbol? (car x))
   (memq (car x) infix-operators)
   (list? x)
   (>= (length x) 3)
   (<= (length x) 6)))

; Return tail of an infix expression, as list of chars
(define (infix-tail op x)
  (cond ((null? x) LISTRBRACE)
((pair? x)
 (append
   '(#\space)
   op
   '(#\space)
   (unit (car x))
   (infix-tail op (cdr x
(#t
 (appe

[Readable-discuss] Line endings

2012-07-21 Thread David A. Wheeler
Clearly an indentation-sensitive syntax has to reliably detect end-of-line.  I 
think it's important, for users, that the reader reliably work even when you 
have line endings that aren't the standard for your platform.  Especially if 
standards mandate their support!

Below is the current status and my current plan to make detecting line endings 
"just work", even in odd cases.  Comments are welcome.

I will *NOT* make any code changes to the reader right *now*. Alan Manuel 
Gloria is reorganizing all the reader code, and while git is good, it can't 
work miracles.  But once he finishes reorganizing, I can easily implement DA 
PLAN below.

--- David A. Wheeler



=== DA PLAN ===

The code already handles line-endings of LF (\n) and CRLF (\r\n), the Unix and 
MS-DOS/Windows conventions respectively.  For most people, that's enough.

But R6RS is more complicated than that. R6RS section 4.2.1 
 defines 
line ending as:
 ::=  | 
 |   | 
 |   | 

While R6RS section 4.1 
 includes 
this definition (which defines the characters):
Some non-terminal names refer to the Unicode scalar values of the same name: 
 (U+0009),  (U+000A),  
(U+000D),  (U+000B),  (U+000C),  
(U+000D),  (U+0020),  (U+0085),  (U+2028), 
and  (U+2029).

Misleadingly, the R6RS section 4.2.2 titled "Line endings" 
 doesn't 
mention all these options. It mentions CRLF and LFCR, but not the IBM "next 
line" character.  But I think the productions in 4.2.1 are intended to govern.

So, here's what I plant to do.  First, I plan to define the following as the 
end-of-line characters:
  , aka \n, U+000A
  , aka \r, U+000D
  , U+0085
  , U+2028

I plan for the readers to follow the following rules:
* Lines end if ANY end-of-line character appears
* To consume the eol, consume the first end-of-line character.  If the next 
character is a DIFFERENT end-of-line character, consume that too.

This way, \n\n is recognized as 2 lines, while \r\n and \n\r are recognized as 
one line. A \n without an end-of-line character after it ends the line, as 
expected.  Weirdness like   is recognized too.  A 
few pairs that aren't required by R6RS would be considered a line-ending as 
well (e.g., ), but I think it's better to use this simple 
rule.  It's sensible, and makes it more robust when dealing with odd text files.

On complication: This would recognize U+0085 as .  If the input 
contains data that read-char interprets as character U+0085, and it was not 
intended to be a next line, well, it'll be a next line now.  But this doesn't 
appear to be likely.  Users who use UTF-8 everywhere will have no issues, of 
course.  Many other encodings, such as Latin-1, will have no problem as well; 
8X is in the control character space for Latin-1 and I believe for all European 
encodings. And so on. I think this is unlikely to be an issue, and the 
advantage is that even unusual line-ending encodings will "just work".


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Readable-discuss mailing list
Readable-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/readable-discuss


[Readable-discuss] File renames

2012-07-21 Thread David A. Wheeler
I plan to do some file names of some executables & the formatter source; let me 
know if there are problems or better names:

* I plan to rename "iformat.sscm" into "sweetener.sscm".  This program takes 
standard Lisp and reformats into sweet-expressions.  I didn't want to call it a 
pretty-printer; it just uses "read" so all comments inside datums are lost.  
Still, with that basic weakness it's turning out to produce fairly clean 
results!
* I plan to create a trivial executable "sweetener", which 
compiles-if-necessary and runs sweetener.  This creates a simple filter to take 
existing code & see what it looks like with sweet-expressions.
* I plan to *rename* "sweet-filter" to "unsweetener".  Now that we have 
programs that can do sweet->traditional and traditional->sweet, that name is 
not so clear (which one is sweet-filter? they are BOTH filters that involve 
sweet-expressions!)

--- David A. Wheeler

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Readable-discuss mailing list
Readable-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/readable-discuss


Re: [Readable-discuss] Bundle of git changes

2012-07-21 Thread David A. Wheeler
Alan Manuel Gloria 

> Okay, here's the public interface:

Overall that looks pretty nice.  Comments below.

> define-module
>   readable sweet-impl
>   :export
>   \\
>   . ; tier readers
>   . curly-infix-read ; :: Port -> Object
>   . modern-read ; :: Port -> Object
>   . sweet-read ; :: Port -> Object

Okay.  Presumably each of these reads one datum from Port, and returns the read 
object.

I think the Ports should be optional, just like read.  Or more accurately, 
there's a default Port of current-input-port().  Not sure how to notate that, 
maybe [Port] -> Object?


>   . ; replacement
>   . replace-read ; :: (Port -> Object) -> void -- replace the 
> implementation's reader

Okay.  This is an interesting generality, I had originally thought of 3 
reader-replacers, but I think your proposed interface is much better (it's much 
more general).

I think this needs to replace not just read(), but get-datum().

In short, people will do:
 (replace-read sweet-read)
interactively and life will be wonderful.

Would this also be placed at the top of files that use these notations?  It 
seems like we'd want a slightly different situation, since someone might want 
to load a file, THAT file would use some notation, but we'd want to restore the 
"old" one.  I'm not sure how to handle that cleanly. Maybe the best way is to 
have different file extensions (e.g., ".cscm", ",mscm", and ".sscm" and teach 
the loader to automatically handle the various cases).  Other ideas welcome.

>   . restore-default-read ; void -- restore the implementation's default reader

Okay.

I still think we need a function that invokes the old (saved) reader just once, 
so that people can briefly invoke it to read a few datums without trying to 
switch back & forth.  Perhaps saved-read or similar will return the old reader. 
Of course, with that you could just (replace-read saved-read) to restore the 
default reader.


I guess people can determine the current reader (if it's one of ours) by doing 
an "eq? read ..." 3 times, once with each of the readers.  Not sure that's a 
great way to do it; I sure feel weird doing eq? on procedures.  Maybe we need a 
simplified interface that can return 'curly-infix or 'modern or 'sweet or 
'other.

I think we need a variant of "load" that takes, as an argument, a reader.  
Obviously that's not hard to create yourself, but I want to be able to 
*trivially* read in a file with a given named notation. Maybe:

. load-using ; :: (Port -> Object) -> String -> unspecified 


>   . ; testing
>   . ; (compare-read-file foo-read "filename")
>   . ; open the file named "filename", and read each expression
>   . ; in the file using the implementation's default reader and the given 
> reader
>   . ; Returns three (values ...): The first it #t if the file is the same for
>   . ; both built-in and the given reader, the 2nd and 3rd are the top-level
>   . ; expression that failed to match, as read by the built-in (2nd) and
>   . ; as read by the given (3rd) reader.  On success the 2nd and 3rd
>   . ; values are just '()
>   . compare-read-file ; :: (Port -> Object) -> String -> (Bool, Maybe
> Object, Maybe Object)

>   . ; like the above, but just the given string
>   . compare-read-string ; :: (Port -> Object) -> String -> (Bool,
> Maybe Object, Maybe Object)

I'm not sure how important this one is, but okay.

> As for the internal portability API, I think I also need to provide a
> portable way to replace the reader - witness the hackery that goes on
> to "seamlessly" replace Guile's reader on 1.6, 1.8 AND 2.0.

Magic!

Thanks for working on this, it's very promising.

--- David A. Wheeler

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Readable-discuss mailing list
Readable-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/readable-discuss