Re: [Factor-talk] New "Modern" Factor parser

Alexander Ilin Wed, 10 Aug 2016 08:04:17 -0700

Hello, Doug!

As you know, I'm a newbie in Factor, so if I ask some dumb questions, please have patience or postpone your answer until you can contain your rage : ))

Basically, I did not understand most of what you wrote in that epochal e-mail about the locals-and-roots branch, and some of the things I imagine I did understand I'd like you to comment on, just to make sure I have too vivid an imagination.

04.07.2016, 05:18, "Doug Coleman" <doug.cole...@gmail.com>:

There's a branch called locals-and-roots that has locals/fry/macros/memoize/stack-checker in core

Does that mean that the minimal size of the executable is growing again?

A) Goals of the new parser:

1) Simplify/regularize the Factor syntax
2) Remove parsing words' ability to take over the parser
3) Keep the parsed text for refactoring
4) Allow out of order definitions and maybe circular vocabulary dependencies

Ouch! Are you sure about this? Is there a need that outweighs the headache? Or is there no headache?

5) Easier syntax for DSLs
6) Support the same syntax everywhere
7) Allow custom lexer/file extension associations
8) Load all code on every platform (but not run)

How will 3) and 8) affect the minimum memory footprint?

What about the image file size?

9) Add strings with arbitrary payloads

What's an arbitrary payload? An example or a use case would be nice.

10) Make most valid-looking syntax lex, but not necessarily compile

B) Syntax cheat sheet:

TAG: objs ;
TAG< FOO: ... ; BAR: ... ; TAG>
tag[ objs ] tag{ objs } tag( objs )
tag"text" tag[[text]] tag{{text}} tag((text))
tag`text tag``text`` tag```text```

In the line above - does the first piece mean that tag`text does not have to end with a closing `?

That seems weird and inconsistent. Is there a justification? Do I have to use double ticks to include a space?

tag\ text

In the line above can text be "a quoted text with spaces" or is it limited to a spaceless token?

If it can be quoted, can it be tagged as well? tag1\ tag2\ tag3[text tag3]?

tag:: obj
obj1: obj2 ! { obj1 obj2 }
-- ! splits a sequence, e.g. ( a -- b ) -> ( { a } { b } )
tag! text til end of line

C) More syntax explanation:

Top level definitions are:
FOO: ... ; or FOO: ... FOO;

Delimiters close like:
V{ 1 2 3 } or V{ 1 2 3 V}
asdf[ 1 2 3 ] or asdf[ 1 2 3 asdf]

What about tag"some string tag"? Is the second instance of "tag" a part of the tagged string, or is it a repeated closing tag?

Nesting definitions inside others:
PRIVATE<
FOO: ... ;
BAR: ... ;
PRIVATE>

Does this mean that the old syntax of <PRIVATE ... PRIVATE> is no longer valid, or is it no longer idiomatic, or both?

I kind of like the <XML ... XML> syntax, and find it easily understandable.

Will it be possible to define words like cm>inch, or will that open a scope that has to end with cm<, or will it trigger the syntax error of "missing the starting cm< tag"?

Syntax itself is concatenative -- you can do things like this:
V{ 1 2 3 }[ 0 ] ! should desugar to C-style array access

Even without the explicit nth word? I don't understand this. Isn't [ 0 ] a quotation that pushes 0 onto stack?

D) Implementation:

All syntax starts with tags (text without delimiters) and tokens end with whitespace or a delimiter.

Could you define "delimiter", please?

A lexing rule is some self-contained rule that parses text according to the delimiter encountered. These rules are used to group tokens together, so at the end of parsing a .factor file, there is a sequence of standalone definitions grouped with their decorators, like inline/foldable/flushable/etc.

In theory, you could at this point ``randomize`` the top-level definitions, rewrite them to disk, then reparse and have the same code, even repeating any number of times.

1) single-line-lexer - like a ! comment right now

The tags for these rules should just be for metadata. I want to change the comment character to # eventually instead of ! (thoughts?)

Examples:
author! erg
! No tag, plain comment

While I have no objection to # for a comment starter, I don't necessarily like the idea of a tagged comment.
But even if there is a good use for such things, to me, a comment starts with a designated character and continues to EOL. It can't eat a word to the left of it.
Same as with <XML, I would prefer comments to be tagged like this: #author erg

2) backtick-lexer - single backtick parses til whitespace, multiple parses til matching number of backticks.

Examples:
char`a char``a`` char```a```
fixnum`3 fixnum``345``

Why not always parse to matching backtick count?
How will this case be handled?
fixnum``hello```world`````
Will the parsing error tell me that ````` is not closed, or that `` is not closed, or that EOF is not expected, wherever it is.

4) dquote-lexer - multiline string parsing until the matching ", with \" escapes

SBUF" needs a space after right now, but with the new branch, you can just do sbuf"hello".

Examples:
url"factorcode.org"
"dquote in string \"wow\""

I like this.

5) single-matched-lexer - lex things until a matching delimiter

( ) things are datastack/function call things
{ } things are data structures
[ ] things are code blocks/lamda/quotation things

shuffle( a b -- b a )
V{ 1 2 3 }
infix[ 1 + 2 ]

6) double-matched-lexer - lex a text payload until a matching delimiter

tag[[text]] tag[=[text]=] tag[==[text]==]
tag{{text}} tag{={text}=} tag{=={text}==}
tag((text)) tag(=(text)=) tag(==(text)==)

Again, the question is what is considered to be a delimiter, and what is a "matching" delimiter.
It seems to me that
tag((text))
is somehow equivalent to
tag((text tag))
Is that right?

Also, depending on the definition of "matching" the following may or may not be allowed:
tag))text((
tag))text tag((

If the latter two are allowed, we may have some serious issues with the helpfulness of our syntax errors.

You can nest arbitrary payloads by ensuring the closing delimiter is not present in the payload. I'm still trying to figure out how to nest infinitely and do refactoring/syntax highlighting inside of payloads.

E) Docs syntax:
You can see the new syntax in action in the kernel-docs.factor file. Hopefully we can golf it down to only the most essential keystrokes.

HELP: WIN-EXCEPTION-HANDLER
$description{ "This special object is an " $link\ alien " containing a pointer to the processes global exception handler. Only applicable on " $link\ windows "." } ;

Is $ a part of the tag? So, it's not a delimiter?
Is # a delimiter?

I'm leaning toward this syntax for examples. Note that [[ ]] parses a string:
$example[[ USING: kernel ; 5 5 = { t } ]]

Instead of printing the answer, we could pop the last stack element and compare the remaining stack to it. Also, this means ``unit-tests`` could be copy/pasted into the examples section, and also that they may be backwards right now.

! unit-test syntax ideas
[ 1 2 + ] { 3 } unit-test
[ 1 2 + { 3 } ] test

I like the first unit-test syntax very much:
[ 1 2 + ] { 3 } unit-test
It just reads more naturally, you know. Maybe my opinion does not matter much here, though, because it may just mean that I haven't twisted my brain enough to appreciate the backwards syntax that we have for unit-tests at the moment.

G) Out-there ideas/open problems

I want to change comments to # and remove , and foo, as words.

No objection from me.

I) Final notes

I would prefer if people started to hack on this ``locals-and-roots`` branch, and we can release the master branch as .98 as-is or after fixing:

https://github.com/factor/factor/issues/1379 - We use opengl 2.1 instead of 3.2+
https://github.com/factor/factor/issues/1487 - X11 resize queues tons of events

Theoretically, if I were to set up a Factor instance derived from the locals-and-roots branch and start hacking on it, how would I share my code?
The branch is not on GitHub, so what's the protocol? Or were you addressing only those who already have the write permissions on git://factorcode.org/git/factor.git?

And finally, the most important question. Will the new parser give better or worse error messages?
When there is a typo in the source text, how close to the spot will an error message be located, and how useful will it be to finding and fixing the typo or a mismatched tag?

---=====---
Александр

------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. http://sdm.link/zohodev2dev

_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

Re: [Factor-talk] New "Modern" Factor parser

Reply via email to