Re: [fonc] OT? Polish syntax

BGB Sun, 18 Mar 2012 23:30:15 -0700

On 3/18/2012 6:54 PM, Martin Baldan wrote:

BGB, please see my answer to shaun. In short:


_ I'm not looking for stack-based languages. I want a Lisp which got
rid of (most of the) the parens by using fixed arity and types,
without any loss of genericity, homoiconicity or other desirable
features. REBOL does just that, but it's not so good regarding
performance, the type system, etc.


fair enough...

but, hmm... one could always have 2 stacks: create a stack over thestack, in turn reversing the RPN into PN, and also gets some "meta"going on...


+ 2 * 3 4 => 24

commands are pushed left to right, execution then consists of popping ofand executing said commands (the pop/execute loop continues until thestack is empty). execution then proceeds left-to-right.

ironically, I have a few specialized interpreters that actually sort ofwork this way:one such interpreter uses a similar technique to implement a naivebatch-style command language. similar was also before used in atext-to-speech engine of mine.


nifty features:

no need to buffer intermediate expressions during execution (no ASTs orbytecode);no need for an explicit procedure call/return mechanism (the process islargely implicit, however one does need a mechanism to push the contentsof a procedure, although re-parsing works fairly well);

easily handles recursion;
it also implicitly performs tail-call optimization;
fairly quick/easy to implement;
handles pause/resume easily (since the interpreter is non-recursive).

possible downsides:

not particularly likely to be high-performance (although animplementation using objects or threaded code seems possible);

behavior can be potentially reasonably counter-intuitive;
...

_ I *hate* infix notation. It can only make sense where everything has
arity 3, like in RDF.


many people would probably disagree.
whether culture or innate, infix notations seem to be fairly popular.

actually, it can be noted that the many of the world languages are SVO(and many others are SOV), so there could be a pattern here.

a reasonable tradeoff IMO is using prefix notation for commands andinfix notation for arithmetic.

_ Matching parens is a non-issue. Just use Paredit or similar ;)

I am currently mostly using Notepad2, which does have parenthesismatching via highlighting.

however, the issue isn't as much with just using an editor withparenthesis matching, but more an issue when quickly typing somethinginteractively. one may have to make extra mental effort to get thecounts of opening and closing parenthesis right, potentially distractingfrom "the task at hand" (typing in a command or math expression orsimilar). it also doesn't help matters that the parenthesis are IMO moreeffort to type than some other keys.

granted, C style syntax isn't perfect for interactive use either. IMO,probably the more notable issue in this case is having to type commas.one can fudge it though (say, by making commas and semicolons generallyoptional).

one of the better syntax designs for interactive use seems to be thetraditional shell-command syntax. behind this is probably C-like syntax,followed by RPN, followed by S-Expressions.

although physically RPN is probably a little easier to type than C stylesyntax, a downside is that one may have to mentally rework theexpressions prior to typing them. another downside is that of beingdifficult to read or decipher later.

something like REBOL could possibly work fairly well here, given it hassome structural similarity to shell-command syntax.

_ Umm, "whitespace sensitive" sounds a bit dangerous. I have enough
with Python :p

small-scale whitespace sensitivity actually seems to work out a bitnicer than larger scale whitespace sensitivity IMO. large-scaleconstrains the overall formatting and may end up needing to be workedaround. small-scale generally has a much smaller impact, and need notinfluence overall code formatting.

the main merit it has is that it can reduce the need for commas (and/orsemicolons), since the parser can use whitespace as a separator (andspace is an easier key to hit).

however, many people like to use whitespace in weird places in code,which would carry the drawback that with such a parser, such tendencieswould lead to incorrect code parsing.


example:
foo (x)
x = 3
  +4
...
could likely lead to the code being parsed incorrectly in several places.

otherwise, one has to write instead:
foo(x)
x=3+4
or possibly also allowed:
foo(
  x)
x=3+
  4
which would be more obvious to the parser.


or, alternatively, whitespace sensitivity can allow things like:
"dosomething 2 -3 4*9-2"

to be parsed without being ambiguous (except maybe to human readers dueto variable-width font evilness, where font designers seem to like tooften "hide" the spaces, but one can assume that most "real"programmers, if given the choice, will read code with a fixed-widthfont...). otherwise, I have had generally good luck with these sorts ofthings.

I also used whitespace sensitivity, with some success, to implementmultiple-operations-per-line in my assembler (x86 and x86-64, partialARM support).


for example:
push ebp; mov ebp, esp; sub esp, 24
...
mov esp, ebp; pop ebp; ret

(depending on whitespace, ';' is either an opcode deliminator orindicates the start of a comment).



or such...

Thanks for your input.

Best,

  -Martin


On Thu, Mar 15, 2012 at 6:54 PM, BGB<cr88...@gmail.com>  wrote:

On 3/15/2012 9:21 AM, Martin Baldan wrote:

I have a little off-topic question.
Why are there so few programming languages with true Polish syntax? I
mean, prefix notation, fixed arity, no parens (except, maybe, for
lists, sequences or similar). And of course, higher order functions.
The only example I can think of is REBOL, but it has other features I
don't like so much, or at least are not essential to the idea. Now
there are some open-source clones, such as Boron, and now Red, but
what about very different languages with the same concept?

I like pure Polish notation because it seems as conceptually elegant
as Lisp notation, but much closer to the way spoken language works.
Why is it that this simple idea is so often conflated with ugly or
superfluous features such as native support for infix notation, or a
complex type system?


because, maybe?...
harder to parse than Reverse-Polish;
less generic than S-Expressions;
less familiar than more common syntax styles;
...

for example:
RPN can be parsed very quickly/easily, and/or readily mapped to a stack,
giving its major merit. this gives it a use-case for things like textual
representations of bytecode formats and similar. languages along the lines
of PostScript or Forth can also make reasonable assembler substitutes, but
with higher portability. downside: typically hard to read.

S-Expressions, however, can represent a wide variety of structures. nearly
any tree-structured data can be expressed readily in S-Expressions, and all
they ask for in return is a few parenthesis. among other things, this makes
them fairly good for compiler ASTs. downside: hard to match parens or type
correctly.

common syntax (such as C-style), while typically harder to parse, and
typically not all that flexible either, has all the usual stuff people
expect in a language: infix arithmetic, precedence levels, statements and
expressions, ... and the merit that it works fairly well for expressing most
common things people will care to try to express with them. some people
don't like semicolons and others don't like sensitivity to line-breaks or
indentation, and one generally needs commas to avoid ambiguity, but most
tend to agree that they would much rather be using this than either
S-Expressions or RPN.

(and nevermind some attempts to map programming languages to XML based
syntax designs...).

or, at least, this is how it seems to me.


ironically, IMO, it is much easier to type C-style syntax interactively
while avoiding typing errors than it is to type S-Expression syntax
interactively while avoiding typing errors (maybe experience, maybe not,
dunno). typically, the C-style syntax requires less total characters as
well.

I once designed a language syntax specially for the case of being typed
interactively (for terseness and taking advantage of the keyboard layout),
but it turned out to be fairly difficult to remember the syntax later.

some of my syntax designs have partly avoided the need for commas by making
the parser whitespace sensitive regarding expressions, for example "a -b"
will parse differently than "a-b" or "a - b". however, there are some common
formatting quirks which would lead to frequent misparses with such a style.
"foo (x+1);" (will parse as 2 expressions, rather than as a function call).

a partial downside is that it can lead to visual ambiguity if code is read
using a variable-width font (as opposed to the "good and proper" route of
using fixed-width fonts for everything... yes, this world is filled with
evils like variable-width fonts and the inability to tell apart certain
characters, like the Il1 issue and similar...).

standard JavaScript also uses a similar trick for "implicit semicolon
insertion", with the drawback that one needs to use care when breaking
expressions otherwise the parser may do its magic in unintended ways.


the world likely goes as it does due to lots of many such seemingly trivial
tradeoffs.

or such...


_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc


_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Re: [fonc] OT? Polish syntax

Reply via email to