(disclaimer: sourceforge only accepts 40kb message bodies without approval, i might have forgotten the admin password, this email might be sent 3+ times, google inbox confuses me with hidden text bloating my replies...)
Hi Александр, I'm finally able to contain enough rage to answer you! ;) --Delimiters-- My definition for delimiter is "character(s) that separate the type of the payload from the payload". Some examples are: url"google.com" url`google.com url( "google.com" ) ! hypothetical call to url word with string as payload The delimiter tokens I'm proposing are [ { ( ` " [[ {{ (( and the corresponding closing tags. --Code bloat-- The minimal executable size is based on what the tree-shaker is able to prune from the program. If there's no locals usage, in theory it should not include any code at all for locals, though this would need to be verified and possibly fixed. As for out of order definitions and circularity, it's hellish right now not having these features. If your vocabularies load correctly without circularity but then circularity is introduced through code changes, right now it will likely reload fine but throw an error upon loading in a fresh image. It's hard to track these down in the fep backtrace as it's just a spew of badly-formatted factor objects and separators. Also, the circularity can be subtle, as it usually isn't as simple as reciprocal usage, it's more likely a chain of dependencies and tracking it down is hard without tools (which we don't have). --Loading code from other platforms-- Loading Linux code on Windows etc would really just be loading the textual code and possibly running the stack-checker on it. It would use as much memory as the text size of the file + the factor slice and syntax objects used to contain that text. Of course you would strip these objects on a deployed image. Loading code for the wrong platform means you can do whatever really, as long as you stop short of calling functions that don't exist on the platform. The advantage you get is being able to use a tool to rename a word completely across every Factor file, not just the loaded files or the files that run on your current platform. There are so many bugs in the git history where we updated one platform but not another because the tools are missing. --Arbitrary Payloads-- An arbitrary payload for text is the ability to put any text at all in the string literal without having to escape it. For instance, the common problem in C/C++ is if you have a comment /* ... */ and then you decide to comment out a larger block that contains that comment. You can use the ``#if 0 ... #endif`` trick, but what if you want to comment that block out? The payload (the comment) starts to interfere with the delimiters. Lua has a cool way to contain any text in a literal without escaping it where you just make sure the delimiters are variable and are not contained in the payload. If C comments worked different: /* */ # first C comment /** /* */ **/ # nested C comment, yes this isn't really C /*** /** /* */ **/ ***/ # etc, making sure ******/ isn't in the payload The same principle applies for string payloads. A common use case is copying text off a website or out of a hexdump, out of a packet capture, etc, and not needing to care about escaping the copied text. You can even generate the right delimiters with a tool or the editor if you have the payload. The motivation for this feature is to allow the programmer to forget about string/comment escaping. Finally, features like python's triple strings '''string''' and """string""" (single and double quotes, tripled) are usually ok, but if you want to write docs about them that contain syntax examples, then you have to micromanage your string delimiters. It's frustrating and programming should not be frustrating in this way! --Backticks-- I wanted a way to golf the C string syntax for strings that don't have spaces. The proposed way is url`google.com, where url is the tag and google.com is the payload. Also I thought that you had to double or triple-quote markdown with backticks, but this doesn't appear to be the case, it supports `thing`. C++ has something like this with their user-defined literals: Kilograms w = 200.5_lb + 100.1_kg; // C++ user-defined literals Notice that you don't have to use two escape characters, just the one works. I'm open to not having the foo`bar form and just making it foo`bar`, but that was the motivation -- golfing it one character! Could also use single-quote or abandon the idea. --Delimiter Location-- The <XML XML> vs XML< XML> is a style choice. It's consistent with the "tagged payloads" style where you have a 1) tag-delimiter-payload-[delimiter]. The other way to do it is by 2) delimiter-tag-payload-delimiter, e.g. ``[fry _ + ]`` or ``{H { 1 2 } }``. The 2) way is consistent with the <XML XML> syntax. It's whatever, but the attempt was at consistency. --Concatenative lexing tokens-- … V{ 1 2 3 }[ 0 ] ! should desugar to C-style array access The idea behind the above syntax was that if several lexed tokens are together without whitespace, then the first one decides how to handle the following ones. So a vector followed by a quotation would attempt to address into itself with the return values from the quotation. This might work better with delimiter style 2) from above, like ``{V 1 2 3 }[nth 0 ]``. On the other hand, this looks "ugly"? What do you think about V{ 1 2 3 } vs {V 1 2 3 } ? --Comments-- author# erg ! a "typed/tagged comment" This is indeed a weird idea. There's nothing preventing it from working, but it's probably too confusing. It works better in style 2) perhaps: #author erg --Backtick example-- - How will this case be handled? - fixnum``hello```world````` This one is a mess. According to the lexing rules, it would see fixnum as a tag of a double-backticked payload, concatenated with a single-backtick "world" with no tag, with an empty double-backtick payload with no tag. I dunno. It seems to lex, but the next phase would pattern-match against fixnum + the rest of the mess and fixnum wouldn't know how to handle it, so it would have a parse error. Something like that. Or the trailing backticks would say "end of file found but expected 4-backticks". --Delimiters revisited-- Literals have to start with opening delimiters. Things like ))foo(( should not work. For ``$description{ "foo" }`` in documentation, the $ just signals docs. It's not really special. Maybe there's a better convention. --Operators-- I had problems trying to figure out the ``char: a`` form, but I think it's just a prefix operator named ``char:``. It should parse as ``char:`` ``a`` and in another pass they should be joined into a single token, operator-char: with a payload "a". Likewise color: hexcolor: pointer: alien: are prefix operators, the pair-rocket H{ 1 => 2 3 => 4 } is an infix operator, and something like ``a++`` could be a postfix operator, e.g. ``1 a++`` where it could increment it at compile-time if it's a literal, or dispatch at run-time otherwise. (Postfix operators are unnecessary?) Prefix and infix operators fix the problem of "related text should parse to a single literal" and the char: prefix operator means that lower-case-colon words don't have to be baked into the lexer. Also, the assignment :> doesn't have to change, it's just a prefix operator! Something like: PREFIX-OPERATOR: \ char: INFIX-OPERATOR: \ => --Fry and make-- It seems that '[ _ , % ] syntax is fine for these, and any other ways to write it, like $[ obj% seq% ] are hard to type into the editor and look weird. --Roots-- I haven't figured out the best repository/directory structure, but it should handle adding repositories with arbitrary URIs like ``@erg/factor``, handle versions of Factor libraries, etc. --Final thoughts-- The parser should give better error messages with enough work and syntax error examples. There's really no reason for it to be worse. Sorry for the mishmash of replies. Cheers, Doug
------------------------------------------------------------------------------
_______________________________________________ Factor-talk mailing list Factor-talk@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/factor-talk