Re: [Factor-talk] New parser discussion (continued)

Dave Carlton Wed, 24 Aug 2016 11:17:21 -0700

Way back long time ago, in the 80's, the old Apple MPW used option key to make 
delimiters easier. Like ∂ as escape, and ¬ for line continuation. Here we are, 
30 years later still using essentially 7-bit ASCII for textual representation 
of code. Why not take advantage of all those unused characters? Like perhaps « 
block » or
…
block of comment text
…


--- 
Dave Carlton 
PolyMicro Systems
da...@polymicro.net <mailto:da...@polymicro.net>
808-220-1727 <tel:808-220-1727>

On Aug 17, 2016, 15:07 -1000, Doug Coleman <doug.cole...@gmail.com 
<mailto:doug.cole...@gmail.com>>, wrote:
> (disclaimer: sourceforge only accepts 40kb message bodies without approval, i 
> might have forgotten the admin password, this email might be sent 3+ times, 
> google inbox confuses me with hidden text bloating my replies...)
> 
> 
> 
> Hi Александр,
> 
> I'm finally able to contain enough rage to answer you! ;)
> 
> --Delimiters--
> 
> My definition for delimiter is "character(s) that separate the type of the 
> payload from the payload".
> 
> Some examples are:
> 
> url"google.com <http://google.com/>"
> url`google.com <http://google.com/>
> url( "google.com <http://google.com/>" )  ! hypothetical call to url word 
> with string as payload
> 
> The delimiter tokens I'm proposing are [ { ( ` " [[ {{ (( and the 
> corresponding closing tags.
> 
> 
> --Code bloat--
> 
> The minimal executable size is based on what the tree-shaker is able to prune 
> from the program. If there's no locals usage, in theory it should not include 
> any code at all for locals, though this would need to be verified and 
> possibly fixed.
> 
> As for out of order definitions and circularity, it's hellish right now not 
> having these features. If your vocabularies load correctly without 
> circularity but then circularity is introduced through code changes, right 
> now it will likely reload fine but throw an error upon loading in a fresh 
> image. It's hard to track these down in the fep backtrace as it's just a spew 
> of badly-formatted factor objects and separators. Also, the circularity can 
> be subtle, as it usually isn't as simple as reciprocal usage, it's more 
> likely a chain of dependencies and tracking it down is hard without tools 
> (which we don't have).
> 
> 
> --Loading code from other platforms--
> 
> Loading Linux code on Windows etc would really just be loading the textual 
> code and possibly running the stack-checker on it. It would use as much 
> memory as the text size of the file + the factor slice and syntax objects 
> used to contain that text. Of course you would strip these objects on a 
> deployed image.
> 
> Loading code for the wrong platform means you can do whatever really, as long 
> as you stop short of calling functions that don't exist on the platform. The 
> advantage you get is being able to use a tool to rename a word completely 
> across every Factor file, not just the loaded files or the files that run on 
> your current platform. There are so many bugs in the git history where we 
> updated one platform but not another because the tools are missing.
> 
> 
> --Arbitrary Payloads--
> 
> An arbitrary payload for text is the ability to put any text at all in the 
> string literal without having to escape it. For instance, the common problem 
> in C/C++ is if you have a comment /* ... */ and then you decide to comment 
> out a larger block that contains that comment. You can use the ``#if 0 ... 
> #endif`` trick, but what if you want to comment that block out? The payload 
> (the comment) starts to interfere with the delimiters. Lua has a cool way to 
> contain any text in a literal without escaping it where you just make sure 
> the delimiters are variable and are not contained in the payload.
> 
> If C comments worked different:
> /*  */        # first C comment
> /**  /* */  **/  # nested C comment, yes this isn't really C
> /*** /** /* */ **/ ***/  # etc, making sure ******/ isn't in the payload
> 
> The same principle applies for string payloads. A common use case is copying 
> text off a website or out of a hexdump, out of a packet capture, etc, and not 
> needing to care about escaping the copied text. You can even generate the 
> right delimiters with a tool or the editor if you have the payload. The 
> motivation for this feature is to allow the programmer to forget about 
> string/comment escaping.
> 
> Finally, features like python's triple strings '''string''' and """string""" 
> (single and double quotes, tripled) are usually ok, but if you want to write 
> docs about them that contain syntax examples, then you have to micromanage 
> your string delimiters. It's frustrating and programming should not be 
> frustrating in this way!
> 
> 
> --Backticks--
> 
> I wanted a way to golf the C string syntax for strings that don't have 
> spaces. The proposed way is url`google.com <http://google.com/>, where url is 
> the tag and google.com <http://google.com/> is the payload. Also I thought 
> that you had to double or triple-quote markdown with backticks, but this 
> doesn't appear to be the case, it supports `thing`.
> 
> C++ has something like this with their user-defined literals:
> Kilograms w = 200.5_lb + 100.1_kg; // C++ user-defined literals
> 
> Notice that you don't have to use two escape characters, just the one works. 
> I'm open to not having the foo`bar form and just making it foo`bar`, but that 
> was the motivation -- golfing it one character! Could also use single-quote 
> or abandon the idea.
> 
> 
> --Delimiter Location--
> 
> The <XML XML> vs XML< XML> is a style choice. It's consistent with the 
> "tagged payloads" style where you have a 1) 
> tag-delimiter-payload-[delimiter]. The other way to do it is by 2) 
> delimiter-tag-payload-delimiter, e.g. ``[fry _ + ]`` or ``{H { 1 2 } }``. The 
> 2) way is consistent with the <XML XML> syntax. It's whatever, but the 
> attempt was at consistency.
> 
> 
> --Concatenative lexing tokens--
> …
> V{ 1 2 3 }[ 0 ]  ! should desugar to C-style array access
> 
> The idea behind the above syntax was that if several lexed tokens are 
> together without whitespace, then the first one decides how to handle the 
> following ones. So a vector followed by a quotation would attempt to address 
> into itself with the return values from the quotation. This might work better 
> with delimiter style 2) from above, like ``{V 1 2 3 }[nth 0 ]``. On the other 
> hand, this looks "ugly"?
> 
> What do you think about V{ 1 2 3 } vs {V 1 2 3 } ?
> 
> 
> --Comments--
> 
> author# erg  ! a "typed/tagged comment"
> 
> This is indeed a weird idea. There's nothing preventing it from working, but 
> it's probably too confusing. It works better in style 2) perhaps:
> #author erg
> 
> 
> --Backtick example--
> 
> - How will this case be handled?
> - fixnum``hello```world`````
> 
> This one is a mess. According to the lexing rules, it would see fixnum as a 
> tag of a double-backticked payload, concatenated with a single-backtick 
> "world" with no tag, with an empty double-backtick payload with no tag. I 
> dunno. It seems to lex, but the next phase would pattern-match against fixnum 
> + the rest of the mess and fixnum wouldn't know how to handle it, so it would 
> have a parse error. Something like that.
> 
> Or the trailing backticks would say "end of file found but expected 
> 4-backticks".
> 
> 
> --Delimiters revisited--
> 
> Literals have to start with opening delimiters. Things like ))foo(( should 
> not work.
> 
> For ``$description{ "foo" }`` in documentation, the $ just signals docs. It's 
> not really special. Maybe there's a better convention.
> 
> --Operators--
> 
> I had problems trying to figure out the ``char: a`` form, but I think it's 
> just a prefix operator named ``char:``. It should parse as ``char:`` ``a`` 
> and in another pass they should be joined into a single token, operator-char: 
> with a payload "a". Likewise color: hexcolor: pointer: alien: are prefix 
> operators, the pair-rocket H{ 1 => 2 3 => 4 } is an infix operator, and 
> something like ``a++`` could be a postfix operator, e.g.  ``1 a++`` where it 
> could increment it at compile-time if it's a literal, or dispatch at run-time 
> otherwise. (Postfix operators are unnecessary?)
> 
> Prefix and infix operators fix the problem of "related text should parse to a 
> single literal" and the char: prefix operator means that lower-case-colon 
> words don't have to be baked into the lexer. Also, the assignment :> doesn't 
> have to change, it's just a prefix operator!
> 
> Something like:
> PREFIX-OPERATOR: \ char:
> INFIX-OPERATOR: \ =>
> 
> --Fry and make--
> 
> It seems that '[ _ , % ] syntax is fine for these, and any other ways to 
> write it, like $[ obj% seq% ] are hard to type into the editor and look weird.
> 
> --Roots--
> I haven't figured out the best repository/directory structure, but it should 
> handle adding repositories with arbitrary URIs like ``@erg/factor``, handle 
> versions of Factor libraries, etc.
> 
> --Final thoughts--
> 
> The parser should give better error messages with enough work and syntax 
> error examples. There's really no reason for it to be worse.
> 
> Sorry for the mishmash of replies.
> 
> Cheers,
> Doug
> ------------------------------------------------------------------------------
> _______________________________________________
> Factor-talk mailing list
> Factor-talk@lists.sourceforge.net <mailto:Factor-talk@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/factor-talk 
> <https://lists.sourceforge.net/lists/listinfo/factor-talk>

------------------------------------------------------------------------------

_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

Re: [Factor-talk] New parser discussion (continued)

Reply via email to