On 10-11-15 12:12 PM, Ian Bicking wrote:
Hi all. I've been interested in Rust, and thought I'd try making an XML
parser -- seems like a fairly simple task, and is the sort of thing Rust
should do.
Good luck! It'll be a challenge given its current state.
So if you'll indulge me...
Of course. I'm sorry the answers are unlikely to be terribly fun.
1. Is the best way to handle expected errors (like a parse error) to use a
tag return type? I'm thinking like:
type xml = rec(...);
type xml_error = rec(str message, int position);
tag xmlerr {
xml;
xml_error;
}
fn parse_xml(str input) -> xmlerr {
}
This will certainly work. Or any similar structural type. For example
"tup(option[xml], option[xml_error]))".
It depends in part on how you want your error checking to proceed. Do
you want to continue to process things post-error? Collect more than one
error? Return partial results? Try different strategies on subsystems?
We argue for a crash-only design for *unexpected* and/or unrecoverable
errors ("exceptions") but something like structured / disjoint-sum
returns (as you are proposing) for expected or recoverable conditions.
It's a matter of taste and domain-modeling to decide which you are
dealing with in any given case.
? I'm confused about something with tags, but I can't quite figure out
what... looking through uses of tag in docs and source I can't figure out
what it should be.
Feel free to ask interactively on IRC, we're around during most workdays.
2. Is there a way to return a record without declaring it?
Yes. 'rec' is both a type-constructor and a value-constructor. You can
just say: "ret rec(message="...", pos=0);".
3. Is the difference between `for` and `for each` just iterating over a
vector/string or an iterator?
'for' runs a single bounds-check at loop-entry, then a compiler-emitted
pointer-bumping loop over the vec-or-str. 'for each' calls an iterator
repeatedly. They have sufficiently different semantics that I figured
they should look different. There are also (or perhaps only "were",
historically) ambiguities about iteration on a call-expression: you'd
need to determine the type of the iteratee (iter or fn) before you could
decide whether the loop intends to iterate-by-calling or iterate over
the return-value from a call.
Possibly this distinction in loop-forms is a mistaken design choice; I'd
be willing to revisit it. We could even recycle pure 'for' loops as the
conventional C-style "for (init; cond; step)" form.
4. I see references to _vec.len[T](), which seems... complex. So would I
really do _vec.len[xml](children) to get a length? What about string
length? I'm only finding references to the byte length of strings, not the
character length.
There's nothing that measures the character length yet. There's very
little unicode functionality in the libraries.
And yes, at the moment the only way to determine the length of a vector
or string is to call the associated len function. The need to provide
the type parameter is temporary and should go away as type inference
improves. It's also possible that we may work out a way of providing
sugar for operators on primitive types such that "v.len" would work, but
there is nothing of the sort proposed yet, and I'd want it to avoid
perturbing the semantics much.
Practically speaking, it may make sense to wire the compiler to equip
the primitive types str and vec to permit indexing by a few utility
fields like 'len'. I'm just concerned that this will grow into a general
demand for utility *methods*, object-like, at which point the compiler
is doing work the libraries should do. So I'd prefer figuring out a
mapping between the desired syntax and "a call to the libraries".
5. There's lots of cases in parsing where an error or a success can be
returned by a routine; I almost always just want to pass the error up when I
encounter it, but the only way I can see to do that is to do a complete `alt
type` condition on success or error. For instance:
let attrs = parse_attrs(input, pos);
alt type (attrs) {
case (xml_error err) {
ret err;
}
}
... now I know it wasn't an error and can continue...?
Anyway, just wondering if there's a quicker way.
Depends how you are dealing with the error. If you want to
unpack-and-repack it (i.e. have an attribute error that is different
from a general xml error) then you need to extract-and-repack, yes.
If you use a record-of-an-option or such, you can do:
"if (attrs.err != None) { ret attrs; }"
Which is shorter. Or you can wrap the check in a helper function. In
general it seems like you're asking "how shall I best simulate catchable
exceptions", which is not something we support.
An idiom you might consider is passing a "result-reporting" channel
downward through your parser, running your parser in a
sys.rustrt.unsupervise()'d sub-task, failing the task after any error is
transmitted out. I am hesitant to suggest that *now* because I think a
good quantity of the machinery to implement it is disabled, unstable or
otherwise not functional, you'll have a lot of stubbed toes and paper
cuts if you try.
6. Is _str.eq() really the right way to do string equality?
Not "the right way", no. But the current way. It's a temporary
workaround for the unfinished structural equality glue in the bootstrap
compiler. Eventually ==, !=, <, etc. will all work. At present they only
work on scalars, fixed-size structures and tags.
Unfortunately (or, depending on your perspective, fortunately) a great
many foibles in the use of the language at present are short-term
limitations due to the limited availability of time and labour.
We're focusing on bringing up the self-hosted compiler just now, so many
library and language-design issues are on the back burner.
-Graydon
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev