Hello,
It should be possible to parse a string containing Factor code and be
reasonably sure that malicious things will not happen.
This is currently not the case. Words can be defined at parse
time. Specifically, parsing words can be defined, and then later
called in the same string.
I wanted to understand how parsing works so I started from 'eval' and
noticed this bit of code:
string-lines parse-fresh
Then:
: parse-fresh ( lines -- quot ) [ parse-lines ] with-file-vocabs ;
The documentation says:
Parses Factor source code in a sequence of lines. The initial
vocabulary search path is used (see with-file-vocabs).
Getting warmer... Here's 'with-file-vocabs'
: with-file-vocabs ( quot -- )
[
f in set
{ "syntax" } set-use
bootstrap-syntax get [ use get push ] when* call
] with-scope ; inline
OK, so the a search path with the 'syntax' vocabulary is hardcoded
there.
Alot of stuff comes in the 'syntax' vocabulary. You get words
for literal syntax of data (e.g. '{' and 'V{')
words which make new words (e.g. ':' and 'GENERIC:')
and words which control which vocabularies are available
(e.g. 'USE:' and 'IN:')
Go ahead browse the list of words yourself:
"syntax" vocab words>> keys .
So how can one ensure that "malicious things will not happen". You must
get a handle on that third category of words. I.e. you must be able to
control which vocabularies are available while parsing.
For my specific purposes, I simply wanted to parse strings containing
Factor data structures like arrays, vectors, tuples, etc. This should
definately be doable in a safe manner.
First we need a word which parses lines using only a specific list of
vocabularies.
: parse-with-vocabs ( vocabs lines -- quot )
[
f in set
swap set-use
lexer-factory get call (parse-lines)
]
with-scope ;
In my case I want to specify a vocabulary which only contains syntax
for literal data. Guess what?
We don't have one.
All we have is the huge catchall vocabulary 'syntax'.
So I'm seriously proposing that we take a more principled approach to
syntax words. Specifically, I think there should be a vocabulary named
'literal-syntax' which contains parsing words used for construction of
Factor data.
Given such a vocabulary, I can define this word:
: literal-parse ( lines -- quot )
{ "literal-syntax" } swap parse-with-vocabs ;
And I can now safely parse strings with Factor literal data...
I think a similar approach can be taken for making "safe"
listeners. I.e. it's reasonable for one to expect the ability to start
a "sealed listener" whereby only certain vocabularies are
available.
As usual, this being Factor, I can pretty much roll my own solution.
A minimal 'literal-syntax': http://paste.factorcode.org/paste?id=152
'literal-parse': http://paste.factorcode.org/paste?id=153
Ed
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Factor-talk mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/factor-talk