Re: [fonc] OT? Polish syntax

BGB Mon, 19 Mar 2012 13:03:27 -0700

On 3/19/2012 5:24 AM, Martin Baldan wrote:

but, hmm... one could always have 2 stacks: create a stack over the stack,
in turn reversing the RPN into PN, and also gets some "meta" going on...

Uh, I'm afraid one stack is one too many for me. But then again, I'm
not sure I get what you mean.


in traditional RPN (PostScript, Forth, ...).
one directly executes commands from left-to-right.

in this case, one pushes commands left to right, and then pops andexecutes them in a loop.so, there is a stack for values, and a stack for "the future" (commandsawaiting execution).

naturally enough, it flips the notation (since sub-expressions areexecuted first).

+ 2 * 3 4 =>  24

Wouldn't that be "+ 2 * 3 4 =>  14" in Polish notation? Typo?


or mental arithmetic fail, either way...

I vaguely remember writing this, and I think the mental arithmetic cameout wrong.

commands are pushed left to right, execution then consists of popping of and
executing said commands (the pop/execute loop continues until the stack is
empty). execution then proceeds left-to-right.

Do you have to get busy with "rot", "swap", "drop",  "over" etc?
That's the problem I see with stack-based languages.


if things are designed well, these mostly go away.

mostly it is a matter of making every operation "do the right thing" andexpect arguments in a sensible order.

a problem, for example, in the design of PostScript, is that peopletried to give the operations their "intuitive" ordering, but this leadsto both added awkwardness and additional need for explicit stack operations.



say, for example, one can have a language with:
/<someName> <someValue> bind
or:
<someValue> /<someName> bind

though seemingly a trivial difference, one form is likely to need moreswap/exch calls than the other.


likewise:
<array> <index> <value> setindex
vs:
<value> <array> <index> setindex
...

"dup" is a little harder, but generally I have found that dup tended toappear in places where a higher-level / compound operation was moresensible.

granted, for example, such compound operations are a large portion of myinterpreter's bytecode ISA, but many help improve performance by"optimizing" common operations).


an example, suppose one compiles for an operation like:
j=i++;

one could emit, say:
load i; dup; push 1; binary add; store i; store j;

with all of the lookups and type-checks along the way.

also possible is the sequence:
lpostinc_fn 1; lstore_f 2;

(assume 1 and 2 are the lexical variable indices for i and j, bothinferred fixnum).

or:
postinc_s i; store j;
(collapsed operation, but not knowing/giving exact types or locations).

now, what has happened?:

the first 5 operations collapse into a single operation, in the formercase, specialized also for a lexical variable and for fixnums (say, dueto type-inference);

what is left is a simple store.

as-noted, most of this was due to interpreter-specificmicro-optimizations, and a lot of this is ignored in the(newer/incomplete) JIT (which mostly decomposes the operations again,and uses more specialized variable allocation and type-inference logic).

these sorts of optimizations are also to some extent language anduse-case specific, but they do help somewhat with performance of a plaininterpreter.

similar could likely be applied to a stack language designed for use byhumans, where lots of operations/words are dedicated to commonconstructions likely to be used by a user of the language.

_ I *hate* infix notation. It can only make sense where everything has
arity 3, like in RDF.


many people would probably disagree.
whether culture or innate, infix notations seem to be fairly popular.

My beef with infix notation is that you get ambiguity, and then this
ambiguity is usually eliminated with arbitrary policies of operator
priority, and then you still have to use parens, even with fixed
arity. In contrast, with pure Polish notation, once you accept fixed
arity you get unambiguity for free and you get rid of parens for
everything (except, maybe, explicit lists).

For instance, in infix notation, when I see:

2 + 3 * 4

I have to remember that it means:

2 + (3*4)

But then I need the parens when I mean:

(2 + 3) * 4

In contrast, with Polish notation, the first case would be:

+ 2 * 3 4

And the second case would be:

* 4 + 2 3

Which is clearly much more elegant. No parens, no operator priority.

many people are not particularly concerned with elegance though, andtend to take it for granted what the operator precedences are and wherethe parens go.


this goes even for the (arguably poorly organized) C precedence hierarchy:
many new languages don't change it because people expect it a certain way;

in my case, I don't change it mostly for sake of making at least someeffort to conform with ECMA-262, which defines the hierarchy a certain way.

the advantage is that, assuming the precedences are sensible, much morecommonly used operations have higher precedence, and so don't needexplicit parenthesis. on average, this tends to work out fairly well.

prefix also works, but has the drawback of being marginally more awkwardfor arithmetic, as well as generally requiring added white-space.


"2+3*4" vs: "+ 2 * 3 4", which contains 4 additional space characters.

actually, it can be noted that the many of the world languages are SVO (and
many others are SOV), so there could be a pattern here.

I've read a recent study which says the human brain seems to be wired
for SOV. The reason for this conclusion was that two groups of deaf
people had independently developed their own sign languages, and both
were SOV.

By the way, I'm playing with the idea of making a logical conlang with
a concise, highly-regular syntax. The most promising type of syntax
seems to be REBOL-like, that is, Polish notation. The funny thing is
that, at present, the way I'm trying to handle event description makes
it have the verb at the end, but it's not RPN. Here's why:

walk evt-1 == "evt-1 is a walking event"

subj John walk evt-1 == "evt-1 is a walking event with John as its subject"

ex-past walk == "there is a past walking event"

ex-past subj John walk == "there's a past walking event with John as
its subject"

My point is that the correspondence between SOV or SVO and the
underlying syntax may not be so straightforward. Also, the issue of
how to build a good model of spoken language is still open.

I once tried to imagine how a language could be structured which couldbe used for both a natural language and as a programming language (thedesign was partly Lisp based). this fell apart.

I later considered the possibility of a "mechanically defined" Englishsubset, but ran into significant problems with the semantic model. itseems it would likely be much easier to write up a parser for asimplified/regularized English grammar than to come up with a reasonablesemantic model for how the language concepts are expressed (IIRC, I wastrying to fit it onto a variant of Prototype-OO and lexical scoping orsimilar).

IIRC, the consideration was that it could have been used as a system ofexpression for game AIs and to some extent for human/AI interactions(within a game world), but this idea never really went anywhere(basically, to allow interactions more advanced than merely attackingthem or engaging in fixed menus or dialogs, but more casual than the useof a scripting language).

I don't think I ever wrote the parser either, since this would have beenfairly pointless without the semantic model.

IIRC: the plan had been to parse the statements into S-Expressions usinga recursive-descent parser: hence why a simplified and more rigidgrammar was defined, although ideally the statements would bereadable/writable by "mere humans", it would not try to deal with theproblem of free-form or ambiguous statements (any such "agents" wouldreject/ignore any statements they don't understand, or maybe ask theuser to rephrase their statement).

it was also noted during this exercise that most mention of "grammar"was mostly people nitpicking and defining seemingly arbitrary/pointless"rules of use" (based on pet-peeves about "how the language should bewritten/spoken" and similar), rather than much of anything which wouldhave been helpful in defining a formal grammar and parser for an Englishsubset, so I think at the time I used "common sense rules", and wrote upsomething based on this.

I think it did place limitations on which combinations of "parts ofspeech" a given word was allowed to have (and, I think made up a few newones, mostly for words which didn't fit well). the parser would havebeen largely dictionary-driven (sort of like the declaration parsing inC and C++...).


but, all this went nowhere...

a reasonable tradeoff IMO is using prefix notation for commands and infix
notation for arithmetic.

You can always use a library for infix instead of having it built into
the interpreter and making life more difficult for those who prefer
the more consistent prefix notation. I would say that's reasonable
enough. For instance:

http://folk.uio.no/jornv/infpre/infpre.html


it is possible.

personally, I tended not to think it was worthwhile to worry about,since most languages for human use can use conventional syntax, and mostones with simpler or more regular syntax (such as S-Expressions) aremostly intended for internal use.

_ Matching parens is a non-issue. Just use Paredit or similar ;)


I am currently mostly using Notepad2, which does have parenthesis matching
via highlighting.

however, the issue isn't as much with just using an editor with parenthesis
matching, but more an issue when quickly typing something interactively. one
may have to make extra mental effort to get the counts of opening and
closing parenthesis right, potentially distracting from "the task at hand"

Ah, but that's the whole point of Paredit. It *doesn't let you* have
unmatched parens. That's right, you just can't do it. You don't write
or delete parens, you create an empty sexpr, you delete it, you move
it around, you swallow the following sexpr into it,  you barf it, you
fuse, splice, etc, always working with sexprs, never with parens.


fair enough.

something like REBOL could possibly work fairly well here, given it has some
structural similarity to shell-command syntax.

I would say REBOL is better, because it's just as terse if not more,
and it's more regular.


possibly.

vaguely similar could make sense for a more advanced shell language.

as-is, my console/shell language is fairly naive, but is generallysufficient for what I use it for (and it allows easily embeddingscript-language code, for more advanced uses).

the main merit it has is that it can reduce the need for commas (and/or
semicolons), since the parser can use whitespace as a separator (and space
is an easier key to hit).

Okay, so it's because the space key is bigger. I see the point, but it
has more to do with keyboard layout than with visual or mental
considerations. I would happily exchange a little typing speed for
stronger visual cues, but other people may have other preferences.

visual cues are more important for reading, but are probably lessrelevant for interactive use, where one is more often typing commandsfor some other purpose, such as fiddling with or testing something,directly controlling something, calculating or showing something, ...

decided to leave out some stuff about integrating script code into gamemaps in my 3D engine, as I wrote it and then couldn't see how it relatedor was relevant.


or such...

_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Re: [fonc] OT? Polish syntax

Reply via email to