On 10-11-15 12:12 PM, Ian Bicking wrote:
Hi all.  I've been interested in Rust, and thought I'd try making an XML
parser -- seems like a fairly simple task, and is the sort of thing Rust
should do.

Good luck! It'll be a challenge given its current state.

So if you'll indulge me...

Of course. I'm sorry the answers are unlikely to be terribly fun.

1. Is the best way to handle expected errors (like a parse error) to use a
tag return type?  I'm thinking like:

   type xml = rec(...);
   type xml_error = rec(str message, int position);
   tag xmlerr {
     xml;
     xml_error;
   }

   fn parse_xml(str input) ->  xmlerr {
   }

This will certainly work. Or any similar structural type. For example "tup(option[xml], option[xml_error]))".

It depends in part on how you want your error checking to proceed. Do you want to continue to process things post-error? Collect more than one error? Return partial results? Try different strategies on subsystems?

We argue for a crash-only design for *unexpected* and/or unrecoverable errors ("exceptions") but something like structured / disjoint-sum returns (as you are proposing) for expected or recoverable conditions. It's a matter of taste and domain-modeling to decide which you are dealing with in any given case.

?  I'm confused about something with tags, but I can't quite figure out
what... looking through uses of tag in docs and source I can't figure out
what it should be.

Feel free to ask interactively on IRC, we're around during most workdays.

2. Is there a way to return a record without declaring it?

Yes. 'rec' is both a type-constructor and a value-constructor. You can just say: "ret rec(message="...", pos=0);".

3. Is the difference between `for` and `for each` just iterating over a
vector/string or an iterator?

'for' runs a single bounds-check at loop-entry, then a compiler-emitted pointer-bumping loop over the vec-or-str. 'for each' calls an iterator repeatedly. They have sufficiently different semantics that I figured they should look different. There are also (or perhaps only "were", historically) ambiguities about iteration on a call-expression: you'd need to determine the type of the iteratee (iter or fn) before you could decide whether the loop intends to iterate-by-calling or iterate over the return-value from a call.

Possibly this distinction in loop-forms is a mistaken design choice; I'd be willing to revisit it. We could even recycle pure 'for' loops as the conventional C-style "for (init; cond; step)" form.

4. I see references to _vec.len[T](), which seems... complex.  So would I
really do _vec.len[xml](children) to get a length?  What about string
length?  I'm only finding references to the byte length of strings, not the
character length.

There's nothing that measures the character length yet. There's very little unicode functionality in the libraries.

And yes, at the moment the only way to determine the length of a vector or string is to call the associated len function. The need to provide the type parameter is temporary and should go away as type inference improves. It's also possible that we may work out a way of providing sugar for operators on primitive types such that "v.len" would work, but there is nothing of the sort proposed yet, and I'd want it to avoid perturbing the semantics much.

Practically speaking, it may make sense to wire the compiler to equip the primitive types str and vec to permit indexing by a few utility fields like 'len'. I'm just concerned that this will grow into a general demand for utility *methods*, object-like, at which point the compiler is doing work the libraries should do. So I'd prefer figuring out a mapping between the desired syntax and "a call to the libraries".

5. There's lots of cases in parsing where an error or a success can be
returned by a routine; I almost always just want to pass the error up when I
encounter it, but the only way I can see to do that is to do a complete `alt
type` condition on success or error.  For instance:

   let attrs = parse_attrs(input, pos);
   alt type (attrs) {
     case (xml_error err) {
       ret err;
     }
   }
   ... now I know it wasn't an error and can continue...?

Anyway, just wondering if there's a quicker way.

Depends how you are dealing with the error. If you want to unpack-and-repack it (i.e. have an attribute error that is different from a general xml error) then you need to extract-and-repack, yes.

If you use a record-of-an-option or such, you can do:

"if (attrs.err != None) { ret attrs; }"

Which is shorter. Or you can wrap the check in a helper function. In general it seems like you're asking "how shall I best simulate catchable exceptions", which is not something we support.

An idiom you might consider is passing a "result-reporting" channel downward through your parser, running your parser in a sys.rustrt.unsupervise()'d sub-task, failing the task after any error is transmitted out. I am hesitant to suggest that *now* because I think a good quantity of the machinery to implement it is disabled, unstable or otherwise not functional, you'll have a lot of stubbed toes and paper cuts if you try.

6. Is _str.eq() really the right way to do string equality?

Not "the right way", no. But the current way. It's a temporary workaround for the unfinished structural equality glue in the bootstrap compiler. Eventually ==, !=, <, etc. will all work. At present they only work on scalars, fixed-size structures and tags.

Unfortunately (or, depending on your perspective, fortunately) a great many foibles in the use of the language at present are short-term limitations due to the limited availability of time and labour.

We're focusing on bringing up the self-hosted compiler just now, so many library and language-design issues are on the back burner.

-Graydon
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to