Hi All,

I wrote a new parser after bouncing a ton of emails off John (thanks!).
It's ready enough to be worked on by other people. There's a branch called
locals-and-roots that has locals/fry/macros/memoize/stack-checker in core,
has a new vocab roots layout for demos/language/libs/ffi/webapps/apps (not
sure I like it), and has the new parser alongside the existing parser in
core/modern/*.

For the most part, everything looks the same. I'm still working on $example
syntax, functors, compilation, refactoring tools, fry in arrays,
stack-checker improvements, etc. You're welcome to help with ideas or code!

Repository:
git://factorcode.org/git/factor.git

Branch:
locals-and-roots

Boot images:
http://downloads.factorcode.org/images/boot.unix-x86.64.image.locals-and-roots
http://downloads.factorcode.org/images/boot.windows-x86.64.image.locals-and-roots

Source:
http://gitweb.factorcode.org/gitweb.cgi?p=factor.git;a=blob;f=core/modern/modern.factor;h=d244d774cfdc7eb1e7f52ccd0091a94f56944a25;hb=refs/heads/locals-and-roots

http://gitweb.factorcode.org/gitweb.cgi?p=factor.git;a=blob;f=core/modern/modern-tests.factor;h=9bdf7183e74ef084c2188f10d5ef01c893e82363;hb=refs/heads/locals-and-roots


A) Goals of the new parser:

1) Simplify/regularize the Factor syntax
2) Remove parsing words' ability to take over the parser
3) Keep the parsed text for refactoring
4) Allow out of order definitions and maybe circular vocabulary dependencies
5) Easier syntax for DSLs
6) Support the same syntax everywhere
7) Allow custom lexer/file extension associations
8) Load all code on every platform (but not run)
9) Add strings with arbitrary payloads
10) Make most valid-looking syntax lex, but not necessarily compile



B) Syntax cheat sheet:

TAG: objs ;
TAG< FOO: ... ; BAR: ... ; TAG>
tag[ objs ] tag{ objs } tag( objs )
tag"text" tag[[text]] tag{{text}} tag((text))
tag`text tag``text`` tag```text```
tag\ text
tag:: obj
obj1: obj2 ! { obj1 obj2 }
-- ! splits a sequence, e.g. ( a -- b ) -> ( { a } { b } )
tag! text til end of line


C) More syntax explanation:

Top level definitions are:
  FOO: ... ; or FOO: ... FOO;

Delimiters close like:
V{ 1 2 3 } or V{ 1 2 3 V}
asdf[ 1 2 3 ] or asdf[ 1 2 3 asdf]

Nesting definitions inside others:
PRIVATE<
  FOO: ... ;
  BAR: ... ;
PRIVATE>

Nested definitions where ``PROTOCOL<`` has arity 1 (one fixed argument
spot):
PROTOCOL< sequence
  GENERIC: set-nth ( obj n seq -- )
  GENERIC: nth ( n seq -- obj )
PROTOCOL>

Semi-colons are optional if you declare an arity:
ARITY: \ CONSTANT: 2
CONSTANT: a 1 ;
CONSTANT: b 2

Top-level definitions and definition-groups interrupt a colon-definition:
: add ( a b -- c ) +   ! error because : has no arity declaration
                               ! and semi-colon is required
PRIVATE<
..
PRIVATE>

Definition groups nest:
PRIVATE<
STUFF<
   THING1: .. ;
   THING2: .. ;
STUFF>
PRIVATE>

The -- from stack effects can generalize to a way to split a sequence in
two without needing extra delimiters. So ( a -- b ) becomes ( { a } { b }
), or H{ 1 2 -- 3 4 } is H{ { 1 2 } { 3 4 } }.

Another pattern we use is ``a: b``, like in stack effects, which makes a
pair.
( a quot: ( b -- c ) -- d )
( a { quot ( b -- c ) } -- d ) ! after pair rule
( { a { quot ( b -- c ) } } { d } ) ! after -- rule
( { a { quot { { b } { c } } } } { d } ) ! after -- rule

Colon pairs means that H{ 1: 2 3: 4 } would work as well. This is slightly
tricky/needs more thought.


Syntax itself is concatenative -- you can do things like this:
V{ 1 2 3 }[ 0 ]  ! should desugar to C-style array access

Of course, you have to compile the new forms or throw an exception.


D) Implementation:

All syntax starts with tags (text without delimiters) and tokens end with
whitespace or a delimiter. A lexing rule is some self-contained rule that
parses text according to the delimiter encountered. These rules are used to
group tokens together, so at the end of parsing a .factor file, there is a
sequence of standalone definitions grouped with their decorators, like
inline/foldable/flushable/etc.

In theory, you could at this point ``randomize`` the top-level definitions,
rewrite them to disk, then reparse and have the same code, even repeating
any number of times.


1) single-line-lexer - like a ! comment right now

The tags for these rules should just be for metadata. I want to change the
comment character to # eventually instead of ! (thoughts?)

Examples:
author! erg
! No tag, plain comment


2)  backtick-lexer - single backtick parses til whitespace, multiple parses
til matching number of backticks.

Examples:
char`a char``a`` char```a```
fixnum`3 fixnum``345``


3) backslash-lexer - a \ turns off the parsing rules for the next token

If you don't escape a lexing form, then it will execute its rule. So this
is how you turn that behavior off.

Examples:
SYNTAX: \ url" ... ;
\ {

! some words in the smalltalk vocabulary end in
! colon, need to escape them like this to call them
! or rename them
execute\ smalltalk-selector:

4) dquote-lexer - multiline string parsing until the matching ", with \"
escapes

SBUF" needs a space after right now, but with the new branch, you can just
do sbuf"hello".

Examples:
url"factorcode.org"
"dquote in string \"wow\""

5) single-matched-lexer - lex things until a matching delimiter

( ) things are datastack/function call things
{ } things are data structures
[ ] things are code blocks/lamda/quotation things

shuffle( a b -- b a )
V{ 1 2 3 }
infix[ 1 + 2 ]


6) double-matched-lexer - lex a text payload until a matching delimiter

tag[[text]] tag[=[text]=] tag[==[text]==]
tag{{text}} tag{={text}=} tag{=={text}==}
tag((text)) tag(=(text)=) tag(==(text)==)

You can nest arbitrary payloads by ensuring the closing delimiter is not
present in the payload. I'm still trying to figure out how to nest
infinitely and do refactoring/syntax highlighting inside of payloads.


E) Docs syntax:
You can see the new syntax in action in the kernel-docs.factor file.
Hopefully we can golf it down to only the most essential keystrokes.

HELP: WIN-EXCEPTION-HANDLER
$description{ "This special object is an " $link\ alien " containing a
pointer to the processes global exception handler. Only applicable on "
$link\ windows "." } ;

I'm leaning toward this syntax for examples. Note that [[ ]] parses a
string:
    $example[[ USING: kernel ; 5 5 = { t } ]]

Instead of printing the answer, we could pop the last stack element and
compare the remaining stack to it. Also, this means ``unit-tests`` could be
copy/pasted into the examples section, and also that they may be backwards
right now.

! unit-test syntax ideas
[ 1 2 + ] { 3 } unit-test
[ 1 2 + { 3 } ] test


F) Refactoring/brainstorming:

This is a work in progress. I'm still unsure about ``tag:: obj`` vs ``obj1:
ojb2`` -- the problem is that in stack effects, the left part of the colon
is always a label, but everywhere else, the left part is an action for
handling the right part as a payload.

The good news is it's not hard to rename everything at once with a simple
command:
all-modern-paths [ ] rewrite-paths ! refactor inside the quot


G) Out-there ideas/open problems

There's nothing inherently stacky in Factor's syntax. You could easily use
this syntax for some kind of Algol/Javascript language.

I want to make it possible to use comma instead of whitespace to separate
lexing forms. Also, it should be possible to convert to and from
Python-style whitespace. The end goal is to have an editor that can
mechanically convert between the syntaxes.

It could be cool to compile this syntax to Javascript, Swift, etc. It's
basically raw syntax without any semantics, so each language would have its
own semantics based on a shared representation in Factor.

How do you make a syntax for DSLs, like EBNF, that can nest N times and
support refactoring for [[ ]] code sections, syntax highlighting, etc? I
was thinking something like foo[0[ bar[1[ [[ factor code ]] ]1] ]0]. The
key is that the DSL can't have the delimiters like [n[ or [[ ]] in its
syntax unless you handle it somehow.

I want to change comments to # and remove , and foo, as words.


H) Try it out!
"modern" load
"modern" test
all-modern-paths [ path>literals ] map-zip
"lol[ 1 2 3 ]" string>literals
"1 2" [ ] rewrite-string
all-modern-paths [ ] rewrite-paths


I) Final notes

I would prefer if people started to hack on this ``locals-and-roots``
branch, and we can release the master branch as .98 as-is or after fixing:

https://github.com/factor/factor/issues/1379 - We use opengl 2.1 instead of
3.2+
https://github.com/factor/factor/issues/1487 - X11 resize queues tons of
events


Too long of an email, thanks for reading!

Cheers,
Doug
------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

Reply via email to