Re: [rust-dev] Cryptol, the language of cryptography
There is nothing hard about it, assuming you are using a decent language. Just add a Crypto type that wraps integers and booleans and that doesn't allow any non-constant time operations nor implicit conversion to anything that is not Crypto (which of course means you can't index memory or do branches based on it). If the optimizer can screw up things, implement the Crypto operations in assembly. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Removing ~"foo"
> Something like this will work, yes. It'll probably look more like: > > Box::new(*x) > > This will be described in some of the RFCs that are coming up soon. Awesome! We should really get rid of the ~T syntax in favor of Foo (where Foo = Box, Own, Heap, etc.), since it is deceptively simple for something that should in fact be rarely used. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Optimizing pattern bindings to not copy anything at runtime
Nice, but why isn't the LLVM optimizer removing the move? Is it lack of proper alias analysis? Sounds like that is a separate issue worth pursuing. > The remaining, more difficult, issue is initialization of > aggregate data structures via constructor functions, which still > involves a bunch of moves, but I don't really see any way short of > macros to optimize that at the moment. Wouldn't #[inline(always)] on the aggregate constructor fix that? (it's not great, but it's strictly better than a macro) At any rate, shouldn't LLVM inlining heuristics catch that automatically as well? If not, that sounds like another separate issue. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Reminder: ~[T] is not going away
> At the moment, Rust is completely broken in this regard. The following > expression evaluates to None: > Some(~()) Ouch, this is a disaster. Is there a bug filed for this? Anyway, I don't get your argument about size to free having anything to do with fixing it (although I agree that size to free is awesome). If you don't care about equality (i.e. the fact that &*~() != &*~(), but a == a where a = &*~()), just return the address of a single private static 1-byte item for any 0-sized allocation. If you DO care about equality, then you will need at least an integer allocation scheme in all cases on 32-bit platforms, and the real costs are the data structures to track that (at least a bit in a bitmask, probably at least 2 bits for an efficient implementation). If you can't use the 1-2GB of kernel address space, then you'll also need to allocate one byte of actual usable address space (but not committed memory). On 64-bit platforms, you generally have at least around 2^60-2^63 bytes of unusable address space, so you can just increment a pointer pointing there for each allocation, at zero cost. Of course the quick and simple fix is to try to call malloc(0) and if it returns NULL, remember that and switch to using malloc(1) instead. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] matching on a few bits in int
I think the best solution is to add uN and sN types where N is not a power of two, which LLVM should already support. Then you can write your match like this: match (val >> 6) as u2 { ... } And it will work as desired. Biggest issue is that to make it work nicely you'd need to add some way to generalize over the bit-length and integers, and that's going to require generics with int parameters and work to add those. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] "Virtual fn" is a bad idea
> However, the extensibility of trait objects comes at the cost of fat > pointers, which can be a problem if you have a lot of pointers. This is fixable without introducing virtual functions, by adding a way to express "Struct and vtable for impl Trait for Struct" and "thin pointer to Struct and vtable for impl Trait for Struct" (or by using an indirect pointer, i.e. ~~Trait or Rc<~Trait>). > It also implies that downcasting isn't really possible, at least not > cheaply, because there is no notion of the "actual type" of a > particular object. I don't understand this. You can downcast trait pointers to struct pointers just fine, since you can put whatever information needed in the trait impl vtable (such as a typeid). Sure, you can't "downcast" a struct to a struct, but that's the whole point of structs. The idea of a struct is that one is referring to something of EXACTLY that type, and so downcasting makes no sense; when one wants a "variable" type one uses a trait and then you can (attempt to) downcast to concrete structs. One can of course introduce "virtual struct"s which are no longer exactly of that type but are now virtual, but that just overlaps the role of traits. > (As an aside, I am personally happier to see virtual structs than to> see > traits extending structs or traits extending a `HasPrefix` trait,> as was > included in previous designs. Both of those approaches mean> that the trait > is no longer "pure interface", and if you write> "generic" code against that > trait, you're actually coding at least> partially against a fixed > implementation, not an interface.) I think traits inheriting structs is a better design, because you can have multiple traits inheriting the same struct. For example, let's say you are modelling GUI widgets which need both input handling and drawing support. With traits inheriting structs, you can have several traits, one for input handling, one for drawing in memory, one for drawing with OpenGL, etc., while with virtual functions (and without multiple inheritance) you have to put everything together and you have to either implement all functionality or add "not supported" error codes. In other words, a trait inheriting a struct neatly separates the trait part where multiple inheritance is natural from the struct part where single inheritance is natural. Also, a trait inheriting a Struct behaves to users of the trait exactly like a trait with a get_struct() accessor method, but with better performance. It's only different for implementors, where it mandates how get_struct() is implemented for the sake of performance, at the cost of making it impossible to implement it along with traits inheriting from incompatible structs. In general I think it's better to have multiple options at a low level like this, rather than having multiple option at an high-level semantic level like virtual struct vs trait, since that exposes the choice less to other modules. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
[rust-dev] "Virtual fn" is a bad idea
I see a proposal to add "virtual struct" and "virtual fn" in the workweek meeting notes, which appears to add an exact copy of Java's OO system to Rust. I think however that this should be carefully considered, and preferably not added at all (or failing that, feature gated and discouraged). The core problem of "virtual functions" (shared by Java's classes, etc.) is that rather than exposing a single public API, they expose two: the API formed by public functions, and the API formed by virtual functions to be overridden by subclasses, and the second API is exposed in an inflexible and unclean way. A much better way of allowing to override part of a struct's behavior is by defining a trait with the overridable functionality, and allowing to pass in an implementation of the trait to the base class, while also providing a default implementation if desired. Another way is to have the "subclass" implement all the traits that the "base class" implements, include a field of the "base class" type, and then direct all non-overridden functionality to the "base class" (here syntax sugar can be easily added to eliminate the boilerplate, by automatically implementing all non-implemented trait functions by calling the same function on the base class field). These approaches can be combined, as the first approach allows to change the "inside" behavior of the base class, while the second one allows to put extra behavior "around" the base class code. The fact that OO using virtual functions (as opposed to traits) is a bad design is one of the crucial steps forward of the design of languages like Go and current Rust compared to earlier OO languages, and Rust should not go backwards on this. Here is a list of issues with virtual functions: 1. Incentive for bad documentation Usually there is no documentation for how virtual functions are supposed to be overridden, and it as awkward to add it since it needs to be mixed with the documentation on how to use the struct 2. Mishmashing multiple unrelated APIs With traits, you could pass in multiple objects to implement separate sets of overridable functionality; with virtual structs you need to mishmash all those interfaces into a single set of virtual functions, all sharing data even when not appropriate. 3. No encapsulation Private data for virtual function implementations is accessible to all other functions in the struct. This means for instance that if you have a virtual function called "compute_foo()" that is implemented by default by reading a "foo" field in the base class, then all other parts of the base class can access "foo" too. If anything else accesses mistakenly "foo" directly, which it can, then overriding "compute_foo()" will not work as expected. If compute_foo() were provided by an external trait implementation, then "foo" would be private and inaccessible, eliminating the problem. 4. Data for overridden implementations left there in a "zombie" state. In the above example, if you override "compute_foo()", the foo variable in the base class will no longer be used, yet it will still be present in the type and allocated in memory. 5. Inability to statically dispatch With a trait implementation, you can pass the concrete type as a generic parameter, allowing static dispatch. If you instead call an overridable virtual function, then you can't dispatch that statically at all (unless you add cumbersome syntax for that). 6. Adds a ton of unnecessary complexity to the language ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Will smart-pointer deref operator allow making iter::ByRef more generic?
I don't think so, because the fact that the particular instance of T implements the Deref trait cannot have any effect on the decorator code, since it's not in the bounds for T. What instead would work is to change the language so that if type Type implements Trait and all Trait methods take &self or &mut self (as opposed to by value self or ~self), then an implementation of Trait for &'a mut Type is automatically generated (with the obvious implementation). Likewise if all Trait methods take &self, then an implementation of Trait for &'a Type is also automatically generated. Then what you want to do will just work without the need of any wrapper or special syntax. One could then, as an additional step, automatically generate an implementation of Trait for MutDeref if Trait is implemented by &mut Type (possibly due to the above technique), but this would not be required for the example. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Handling I/O errors
> it is guaranteed to happen on all readers I meant all finite readers, such as those for normal disk files. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Handling I/O errors
Are we sure that this is the correct design, as opposed to having read return 0 for EoF or perhaps returning None with a Result, IoError> return type? After all, EOF is unlike all other I/O errors, since it is guaranteed to happen on all readers, and the fact that it needs to be special cased might be an indication that the design is wrong. Also, in addition to "raw" eof, a partial read due to end of file causes the same issues, so it seems that code needs to have two match conditions to handle eof, while it would only need a single one if eof was represented by Ok(0). ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Idea: Memcpyed stacks for green tasks
The ratio of native threads to stacks and of stacks to tasks can actually be used to characterize all systems discussed. (stacks/thread, tasks/stacks) (1, 1) => current Rust native tasks (1, N) => Python greenlets (N, 1) => current Rust green tasks (N, M) => proposal in my original mail ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Idea: Memcpyed stacks for green tasks
Interesting: my proposal appears to be indeed a generalization of the greenlet approach. Specifically, while the greenlet proposal seems to only use one large stack per native thread, I'm suggesting to use multiple large stacks that can be stolen by other threads, which does mitigate the issues stemming from non-position dependent stacks (they are not eliminated, but they have low probability of happening). It's also indeed possible to fully eliminate those issues by autoboxing everything whose address is taken, but that would have a potentially large performance impact on non-blocking code, while merely copying the stack only imposes a performance penalty to code that blocks. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Idea: Memcpyed stacks for green tasks
> I assume this is incompatible with work stealing and task migration > between threads? It is compatible, assuming that the number of large stacks is sufficiently larger than the number of threads. Basically, each green task can only run on a specific large stack, but as long as you aren't unlucky that all runnable green tasks you'd steal are running on a large stack that has already a running green task, then you can steal one. The probability of that obviously decreases the more large stacks you use; large stacks can only consume address space (if you wipe them with madvise(MADV_DONTNEED) or similar when unused), so one can reasonably have 256 8MB large stacks on 32-bit, for instance. So for instance, if you had an HT 4-core machine with 8 system threads running green tasks and 1024 large stacks, the probability of a green task that is not blocked but not executable (because its large stack is in use) is 7/256; obviously if K tasks are not blocked, then the probability of having none executable becomes (7/256)^K (i.e. exponentially lower). On 64-bit, the probability can be made arbitrarily low, although you consume more kernel memory to store the metadata associated with the memory mappings for the large stacks. It would require to rearchitecture the work stealing code to work with a two-level data structure and first steal a large stack with at least one non-blocked task, and then run the task, rather than directly stealing the task. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
[rust-dev] Idea: Memcpyed stacks for green tasks
Stack management for green tasks has been based in the past first on segmented stacks and then on standard large stacks. However, I just realized that there is a third alternative which might well be better than both of those. The idea is very simple: a green task would run on a large stack like now, but when it is descheduled, a memory buffer of the size of its used stack space is allocated, and its stack data is copied there; when it is rescheduled, the stack data is copied back to the original address. The copying can be performed either by memcpy (perhaps with a specialized memcpy that assumes alignment and always uses SIMD instruction to copy with no checks), or perhaps by using a compression algorithm designed for maximum speed, such as Snappy or maybe a new ad-hoc design. This allows to have the best possible memory efficiency and thus the maximum possible number of tasks, even better than segmented stacks due to precise sizing and optional compression, while not adding any overhead to code that does not block, either in terms of code generation complexity or runtime cost. There is of course an additional runtime cost when blocking, but blocking already often involves both system calls and context switching, and workloads with lots of blocking green tasks are probably I/O bound anyway; furthermore, stack data is probably in the CPU cache, so there shouldn't be much penalty. The only significant drawback is that two green tasks running from the same "large" stack (note that stacks cannot be easily switched, as that would invalidate borrowed pointers on the stack to the stack) cannot ever run concurrently; however, by making the number of large stacks a large enough multiple of the number of CPU cores, the probability of reduced parallelism due to this issue can be made as small as desired. In general, one would use an adaptive approach, where this kind of copying would start to kick in only once the number of green tasks becomes large; when not copying, one would just optionally use madvise(MADV_DONTNEED) to trim unused whole pages of the stack. Also, if the data to be copied is large (several OS pages), one may use mremap() or similar facilities to move the address space mappings instead of the data itself. Some unsafe code that assumes that tasks can access each other's stacks will break (probably only the new Mutex implementation), but that can be fixed by putting the data in the task structure instead of the stack. Overall, this should allow writing something like a parallel network server or web crawler in Rust in a natural blocking style, while being fully confident in its scalability, which is something that is not really possible at the moment. Once this is in place, the best practices for Rust programs would become: 1. Use native tasks for tasks whose number is bounded at compile time and small 2. Use green tasks for tasks whose number is unbounded or large One could even automate this to some extent by spawning by default a native task the first C times a given proc() code address is used to start a task (where C = number of CPU cores), and green tasks afterwards. What do you think? ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Ephemeral byte arrays for cryptographic keys/plaintexts
At any rate, note that what you are trying to do only provides some mitigation and is far from a complete solution, because in practice you can't prevent leakage of all confidential data in this way (what about hibernation while the key is in memory? what about plaintext decrypted with the key?) The only effective solution is to encrypt all storage including swap using full-disk encryption, as well as all internal network links using IPsec or similar, so that it doesn't matter if sensitive data is swapped, accidentally written to files or communicated between servers. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Ephemeral byte arrays for cryptographic keys/plaintexts
This can be easily implemented in Rust as a struct doing exactly that. There's no need to modify the I/O layer, since you'd simply borrow an &[u8] from the type and pass it, resulting in the I/O layer directly writing into the locked zeroed-on-destruction memory. As for crypto, it seems the plan is to not implement it in Rust, but to bind to libraries such as OpenSSL, libgcrypt, Windows CryptoAPI, etc. I guess patches would be welcome to implement this. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
[rust-dev] RFC: Generalized monadic notation
Some languages support a special "do notation" that allows to express monadic operations more naturally. However, there is an even more powerful option, that I'd call "in notation" (I came up with it, but it's obvious, so I'm sure there's some language that has something like it). The idea is that we add an "in" keyword, and in-types, which are the type "in T" where T is a monadic type. In-types are physically represented as the monadic type, but semantically behave like the type contained in the monad; they are constructed with the expression "in x". The "out" keyword does the opposite, converting an in-type to the wrapped type. Operations performed on in types actually act on the value inside of the monad, as better specified below Quick examples: out(in Some(3) + 6) gives Some(9) out(in Some(3) + in Some(6)) also gives Some(9) out(in Some(3) + in None) gives None out(in ~[1, 2, 3] + 10) gives ~[11, 12, 13] out(in ~[1, 2, 3] + in ~[10, 20, 30]) gives ~[11, 21, 31, 12, 22, 32, 13, 23, 33] Internally, when the compiler encounters any expression including an in-type (expressions include control flow constructs), it proceeds like this (considering its operands from left to right): 1. Optionally, if the expression is correctly typed (e.g. calling a function that takes an in-type), it compiles it normally 2. If the expression does not have an in-type value itself, it converts the expression into a closure, and passes it to map() 2. If the expression does have an in-type value itself (for example in x + in y has an in-type when viewed as a function of in x, due to the presence of in y), it converts out(expression) into a closure, and passes it to flat_map() Non-movable control flow like return or break is forbidden in expression involving in-types (they can be implemented with a flag, but that causes other equivalence issues). The advantage of this is that special do-notation syntax is not required, and one can naturally manipulate the values. The disadvantage is that it adds a layer of "magic", making what the code actually does less obvious. This is particularly true with list-like monads, where simple things like (in x) + (in y) actually become a double loop with quadratic run-time; this can be optionally avoided by only allowing "in notation" to be used with monads that call the closure once. Also, side effects will be performed a different amount of times depending on the monad values, which may also not be obvious. Note that there are a lot of details that I omitted for the sake of brevity, and would need to be handled. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Safely writing iterators over idiomatic C structures
> It can't be defined that way without becoming much less useful for > more common cases. Any example of this? Iterators are most commonly used in for loops, and this change makes no difference in this case. Functions that transform iterators also work the same. Things like collect() don't work for all iterators, but it should be possible to make them work for iterators on types that don't have lifetimes. That is, introducing syntax like this: fn collect<'a, T:'a, U: Iterator>(iter: U) -> ~['a U] {} where the 'a bound on T means that the 'x T is convertible to 'a T for any x. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Safely writing iterators over idiomatic C structures
Maybe the language should be changed to allow Iterator to be changed to have a signature like this: pub trait Iterator { fn next<'a>(&'a mut self) -> Option<'a A>; } Then you could return the &mut by reborrowing and would be able to advance the iterator without issue afterwards. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] How to reply properly in digest mode?
Never use digest mode. Instead, use normal mode and if necessary add a filter in your e-mail website or application to separate all mailing list messages in a specific folder or label. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Persistent data structures
No, the problem you describe does not exist in my implementation because it requires an &mut to the smart pointer. In particular, if the reference count is 1, then there is no other Rc and Arc pointing to the same data, and because we have an &mut there is also no other borrowed reference to the Rc or Arc we are manipulating. Hence, if the reference count is 1 and we have an &mut to the Rc or Arc, it's safe to return an &mut pointing to the contents and thus mutate its contents in place using it. If the reference count is more than 1, then it uses read-only access to clone the contents, giving a new allocation with reference count 1 that can then be mutated in place as above. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Persistent data structures
> Did COW improve performance? What's a good way to do performance > testing of Rust code? The reason I introduced COW when RC > 1 is that it allows persistent data structures to be mutated in place if there aren't extra references, just like non-persistent data structures. Lots of languages with persistent data structures can't do that because they use garbage collection (and thus they can't tell whether there are other references due to lack of a reference count), but it seems an essential feature to have if one is using reference counting in a language like Rust that is supposed to be a systems language producing optimal code. > Should I try to get your patches to Rc and Arc merged? They look > generally useful, if folks think they fit the design of Rc. You also > have a patch that adds Own to std which is equivalent to ~T but has > methods that look like Rc's. You use this, and putting most of your > treemap.rs inside macro definitions, to get a sort of > reference-type-polymorphism in your treemap. Shall I continue with this > approach and try to get Own merged too? (Opinions from anybody are > welcome.) I did it because I realized that having two different balanced tree implementations would have imposed a significant maintenance burden, and thus the macro and Own approach would have allowed to avoid that by replacing the current treemap code completely with the new optionally-persistent version. Also, having both Rc and Arc versions seems quite useful anyway, and currently I think macros are the best way to have both, until Rust starts supporting higher-kinded types (which would allow to pass Rc as a parameter), or at least recursive types (which would allow to express Rc instead of ~T and so on; I guess someone on the Rust core team will have to decide on that, and there's already some discussion on that on https://github.com/mozilla/rust/pull/9786 ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Persistent data structures
Hello, I already implemented a persistent tree-map called SnapMap: you can find the source code at https://github.com/mozilla/rust/pull/9816 I stopped working on it before I made a serious effort to push it into the Rust codebase and don't have time to work further on it, so it would be awesome if you were interested in continuing that effort. It's pretty much finished, but pretty much completely untested (however, it was derived from the existing treemap, so the amount of bugs should not be as high as something written from scratch and untested). The first thing that needs to be done is to run the existing tests inherited from treemap, fix any bug that shows up, and then write more tests to test SnapMap specific usage patterns and features. Then, one should look at the mutable/handle-based iterators in the code and decide whether they should be removed, kept as is, or improved, possibly after changing the iterator interface to return an object with the lifetime of the function call instead of the iterator. The rest of the code should be fine (except for bugs due to no testing), but there might be places where unnecessary copy-on-write is performed. My code used an improved version of Rc that supports copy-on-write by cloning only when reference count > 1. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Removing some autoref magic
Have you considered making deref the default instead and requiring moves to use an explicit "move" keyword? Basically, from this hypothetical syntax to current one: - x => &x - mut x => &mut x - move x => x One could even make the & implicit in parameter types in function declarations unless the "move" keyword is used, so we would have (new => old syntax): - fn(self, ...) => fn(&self, ...) - fn(mut self, ...) => fn(&mut self, ...) - fn(move self, ...) => fn(self, ...) Regarding owned pointers, one could introduce this syntax: let *x = func_that_gives_owned_pointer() which makes x a variable of type T referring to the contents of the owned box, and thus derefed to & when passed to a function as above without having to write "*x" in the function call. One could then add a new "#x" or similar syntax that would get back to the owned pointer. Not sure if all this is a good idea, just throwing it out there. I actually think keeping autoref/autoderef is probably better. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] C# async for Rust
Although, on second thought, one could just free the unused part of the user mode stack whenever a thread blocks, either in the user mode code (i.e. using madvise MADV_DONTNEED or equivalent to discard everything below the stack pointer modulo the page size, perhaps minus the page size) or automatically in a modified kernel, and thus greatly reduce the worst case. It's still going to have higher overhead than the CPS continuations on the heap, because the stack will tend to have holes where dead variables live, for aligning to page boundaries, and you also keep around the kernel stack and kernel scheduler objects. And it doesn't work on 32-bit, because you cannot have more than around 2048 tasks with megabyte-sized task stacks (which is way too low for several usages), and unfortunately there are lots of 32-bit-only smartphones and tablets that should probably be supported. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] C# async for Rust
> The issue with async/await is that while it maps very well to the AIO > primitives like IOCP and POSIX AIO, it doesn't map well to something > that's solid on Linux. It's just not how I/O is done on the platform. > It uses *non-blocking* I/O so scale up socket servers, with > notification of ready state rather than completion state. This doesn't > work for file system access though. Well, I/O would work exactly the same as the current libuv-based Rust tasks approach, except that one needs a stack for each CPU thread and not for each task, because task creation functions return when a task wants to block, and thus there is no stack to save and restore. On Linux, the epoll interface allows to implement this, and I think it's what libuv uses. The only problem is that Linux doesn't really support asynchronously resolving file paths to inodes (aka opening files), but that can be done on a dedicated thread pool, with the advantage that the threads don't do anything, so they don't have zombie stacks. The problem with blocking on all threads/the 1:1 thread model is that if you do a computation that requires 8MB of stack, and then block with a shallow call stack, the 8MB of stack are not freed, so you waste 8MB per TCP connection in a TCP server example, which means that on a system with 32GB RAM you can only service 4096 TCP connections without swapping to disk instead of millions. A dedicated thread pool for blocking I/O doesn't have this issue because it can run with only 4KB stack or so since it doesn't do anything stack intensive, and the number of its threads can be limited without user-visible results. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] C# async for Rust
> This is similar to IO Monad in Haskell. Adding that to previously pure > computational code is painful, but on the other hand it does emphasis > that computational code != IO code and minimizing mixing between two > typically leads to better design overall. Yes, but you can have the compiler automatically transform normal code into the equivalent of Haskell's "do notation", which is what C# does with async. Basically code like this: async fn foo() -> uint { let a: int = compute_a(); let b: int = compute_b(); let c: char = await read_char(); let r: uint = compute_r(a, b, c); return r; } becomes after the compiler performs CPS transform: // assume that we have trait Continuation {fn continue(~self, v: T);} // assume that cps_read_char is provided by the runtime and tells a main loop to read a char and then schedule a new task running the passed continuation struct foo_Continuation(~Continuation, int, int); fn cps_foo(continuation: ~Continuation) { let a: int = compute_a(); let b: int = compute_b(); return cps_read_char(~foo_Continuation(continuation, a, b)); } impl Continuation for foo_Continuation { fn continue(~self, c: char) { let (continuation, a, b) = *self; let r = compute_r(a, b, c); continuation.continue(r); } } and then a future-based version can also be generated by the compiler based on the CPS transform results: // assume that Continuation is implemented on TaskFuture and causes the TaskFuture to be completed with the value fn future_foo() -> ~TaskFuture { let f = ~TaskFuture::new(); cps_foo(f); return f; } Note that if foo() had taken a borrowed pointer with lifetime 'a, then the CPS version becomes unsafe, and TaskFuture would also need an 'a parameter. Also, one can use a borrowed pointer instead of an owned pointer for continuations and futures, but then they need to be waited on before its lifetime region ends. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
[rust-dev] C# async for Rust
I see several proposals for the future of Rust tasks, and I think one of the best approaches is being overlooked, and that is something similar to async in C# (http://msdn.microsoft.com/en-us/library/vstudio/hh191443.aspx). In C#, the "async" keyword can be applied to functions and it causes the compiler to transform a function that returns T into a function that returns a Task (which is C#'s name for a future of type T) representing the potentially asynchronous computation. Blocking is representing by using the "await" keyword on a future (typically returned by a call to another "async" function), and it causes the compiler to perform a Continuation-Passing Style transformation, and attach the continuation to the future, returning another future representing the composed computation. I/O functions are designed to return futures, so in this system blocking causes all calling "async" functions to return, building up a chain of continuations on the heap, which is equivalent to the machine stack in a current-Rust task, but which is as small as needed, and is only used for call chains that block. In Rust, this transformation is much more delicate, because the resulting return value futures must have a lifetime that is the smallest among all the arguments to the function, if those arguments are needed by the continuation, and the argument types must be "shareable" between parallel forks (e.g. std::rc::Rc is not shareable because RC manipulation is non-atomic). However, it is possible to restrict the system to use non-delimited continuations instead of the delimited continuations and futures, which would avoid this problem, since futures cannot be used explicitly anymore (at the cost of making flexible parallelism impossible). In this latter case, it would be equivalent to the current task system, except for requiring blocking functions to be marked "async"; the "await" keyword would not be required, since it would effectively become compulsory if there are no first-class futures returned for async functions. Advantages: - Functions and interfaces that can perform I/O or can block are clearly marked by a keyword - Can have billions of blocked tasks (with hundreds of gigabytes of RAM) since the memory used by each blocked task is truly minimized because it's on the heap Disadvantages: - Requires an heap allocation for every function call to an async function (to hold the continuation data) - Non-"async" functions cannot call "async" functions, so interfaces must be explicitly marked as async or not - Requires to implement the transform in the compiler Microsoft switched to this paradigm in the latest version of C# and in the Windows RT API, and it might be an appropriate choice for Rust too. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Abandoning segmented stacks in Rust
The advantage of segmented stacks is that blocked tasks only take up as much memory as they actually need to store state, so that for instance a network server can use a task for each connection, and still only use, say, 64 bytes per connection if that's possible instead of the number of stack pages that got allocated for previous computation (assuming an "extreme" version that allocates a stack segment on every call). However, there is another approach that can replace segmented stacks for that purpose, namely having the compiler automatically transform blocking functions to instead return a future (with limited lifetime). This means that segmented allocation only happens for functions that indirectly perform I/O and only allocates the exact amount of memory needed to retain state that must persistent across the blocking I/O operation, while other functions execute normally using traditional stacks. The simplest example of this feature is async/await in C# 5, and Scala has a delimited continuation passing transformation that can be used to do the same thing. Has this been considered for Rust? ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] On Stack Safety
> It seems to me that trying to determine max stack size is incompatible with > dynamic linking. So even if you disallow recursion, any function that calls > a function outside of its own crate is not going to be able to trust its > calculated max stack size. The maximum stack size needs to be computed dynamically, in a "C++ global constructor" placed by rustc in all crate compiled libraries/executables that would write the computed value in suitable global variables. The check only needs to be done by task creation routines, by function calls contained in a recursion cycle, by closures and by stubs put in ~Trait/@Trait vtables (where Trait is public and thus implementable by other crates). Note that the latter two cannot really be avoided by taking the max over all implementations dynamically because a new dynamic library could be loaded between the check and the indirect call. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Should I/O use conditions?
What about the idea of making Result cause task failure if it is destroyed in the error case? (as opposed to destructuring it) This just needs a simple language change to add an attribute that would be applied on Result to declare that it's OK to destructure it and cause drop() to not be called. In addition, maybe some EH syntax sugar like the one I proposed in http://www.mail-archive.com/rust-dev@mozilla.org/msg04093.html ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
rust-dev@mozilla.org
> Allowing one closure to take &mut while another takes &const would > create a data race if the two closures are executed in parallel. Closures executable in parallel would probably have kind bounds forbidding &const: http://smallcultfollowing.com/babysteps/blog/2013/06/11/data-parallelism-in-rust/ explicitly mentions this. BTW, how about keeping it, and calling it "&volatile" instead of "&const", since that's what C uses to name something that can be changed outside the program's control? ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
[rust-dev] x;RE: Structural enums for datasort refinements
I was talking about http://smallcultfollowing.com/babysteps/blog/2012/08/24/datasort-refinements/, which essentially introduces structural enums but only for variants belonging to the same named enum. > * This strikes me as an extreme change to the language, but perhaps my gut is overly conservative. Well, to some extent: it does indeed cause a fundamental change in type inference, since now any finite set of types can be "unified" as a structural enum type. That said, I think this needs to be done anyway in something like Matsakis' proposal, since otherwise you can't pass an x declared as "let mut x = A; if(foo) {x = B;}" to something taking a Enum[A, B] if Enum also has a C case, so it only really changes how frequently the expanded type inference would be used. BTW, it should be allowed to call impl methods, trait methods and access fields that are common to A, B, C on "A | B | C", effectively adding a new "closed" polymorphism system that can be used by just switching a types from Trait to A | B | C if the only possible Trait implementations in that code are A, B, C, with the rest of the code being unchanged. > You have not addressed in your proposal how you would change the match syntax to deal with non-struct variants, such as ~str or int. Scala uses "x: int" to match an int (or type pattern in general) an assign it to x, and "x @ " to assign x to something matched by a pattern (since Scala distinguishes between "value" patterns and type patterns). In Rust distinguishing between value and type patterns might not be necessary, so one can probably use the same character for both cases. > Do the tags on the variants need to encode the whole type, and not just which struct it is? They of course need to identify the whole type, and they probably will change numerically depending on generic instantiation if some can unify under some generic instantiations. > And what about the poor user who didn't think about the fact that they might alias each other, and > thought that all the clauses in the code for A | B| S were disjoint, but in fact they potentially overlap > due to the potential aliasing, and thus the order of the cases in the above is now crucial, yes? Well, the poor user's thought process turns out to be incorrect, since any enum including a naked type parameter is obviously not disjoint in all cases. > -- Another example: Can/should I now just throw seemingly unrelated struct's into a match, in order to > anticipate that the parameters will be instantiated with that struct? Consider the following: Yes: the compiler can potentially emit an error or warning if the match will never match under any generic instantiation. This also allows a form of internal "partial specialization" of functions, where a function can have specialized behavior for some subsets of generic instantiations. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Structural enums for datasort refinements
The issues in O'Caml seem to be due to the fact that in O'Caml function parameter and return types are inferred, and thus "accidentally oversized" enum types can propagate through them. In Rust, they must specified by the user, so those oversized enum types will cause an error as they are passed as a parameter or return value with an user-specified type that is "smaller" than the inferred type. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
[rust-dev] Structural enums for datasort refinements
I was reading a proposal about adding "datasort refinements" to make enum variants first-class types, and it seems to me there is a simpler and more effective way of solving the problem. The idea is that if A, B and C are types, then "A | B | C" is a "structural" enum type that can be either A, B or C. In addition, A can be implicitly converted to "A | B", "A | B" can be implicitly converted to "A | B | C", and also "(A | B) | C" and "A | (B | C)" are equivalent to "A | B | C", and finally "C | B | A" is equivalent to "A | B | C" (to support the latter, the implementation needs to sort variants in some arbitrary total order before assigning tag numbers). Furthermore, a way to bind variables to an "or" pattern is introduced to allow to convert "A | B | C" to "A | B" in the case that it holds an A or a B. This way, one can rewrite Option as a type alias like this: struct Some(T); struct None; type Option = None | Some; Which is like the current Option, but also makes None and Some first-class types. The current enum syntax can remain as syntax sugar for the above code. The only issue I see is what to do for code such as "let mut x = Some(3); x = None;": with this proposal, Some and None are separate unrelated types, so we either have this code emit an error, or x must be given the type "Some | None" automatically, which however can lead to obscure error messages if one mistakenly attempts to assign a string to it causing the type to become "Some | None | ~str" (i.e. the user might be told than a match is not exhaustive because it does not handle the "~str" case, rather than that they assigned a ~str to an Option-typed variable). It should be possible to allow this, and make the error-emitting code use heuristics to figure out whether it is more likely that the user assigned a value of the wrong type, or used an enum improperly (for example, by looking at whether the implicitly created enum type is ever written explicitly in the source, and whether the deduced structural enum type is being used in places that require a non-enum type). Alternatively, one can stipulate that only types that are structs, or that are structs marked "enum struct" or "case struct" or similar can become part of an inferred structural enum, but this seems unappealing. Note that some structural enums can change representations depending generic instantiation, since "T | int" becomes just "int" if T = int, while it is "~str | int" if T = ~str (and similar for "Some | Some"), but this should not be a problem. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] cycle time, compile/test performance
> - We essentially always do whole-program / link-time optimization > in C++ terminology. That is, we run a whole crate through LLVM at > once. Which _would_ be ok (I think!) if we weren't generating > quite so much code. It is an AOT/runtime trade but one we > consciously designed-in to the language. > time: 33.939 s LLVM passes Maybe this should be changed to optionally do codegen and LLVM passes in parallel, producing an LLVM or native module for each Rust file, and then linking the modules together into the compiled crate. Alternatively, there seems to be some work on running LLVM FunctionPasses in parallel at http://llvm.1065342.n5.nabble.com/LLVM-Dev-Discussion-Function-based-parallel-LLVM-backend-code-generation-td59384.html but it doesn't seem production-ready. Most large C/C++ projects rely on parallel make and distcc to have reasonable build times, and it seems something that Rust needs to support (either via make/distcc or internally) to be a viable replacement for large projects. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] cycle time, compile/test performance
> > 2. Distribute compilations and tests across a cluster of machines (like > > distcc) > > > > Compilation is 99% serial (the only things that happen in parallel > are rustpkg and rustdoc etc at the end, and they are almost nothing), > though tests could be distributed (and Graydon is working on doing > that afaik). Are you sure? I'm not an expert on rustc implementation, but I'd expect you could have a sequence of parallel phases and "collection" points: specifically I'd guess you could parse all files in parallel, and once you have the AST for all files, do checking and typing in parallel, and then do codegen in parallel. For comparison, C/C++ can do everything except linking in parallel, except for parsing header files if not using precompiled headers (which is still parallel but repeated for each file); linking can be parallelized to some extent on a single machine with gold. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] cycle time, compile/test performance
Have you considered the following "non-specific" quick fixes? 1. Build on a ramfs/ramdisk 2. Distribute compilations and tests across a cluster of machines (like distcc) 3. If non-parallelizable code is still the bottleneck, use the fastest CPU possible (i.e. an overclocked Core i7 4770K, overclocked Core i7 4960X/3960X/4930K/3930K, dual Xeon E5 2687W or quad Xeon E5 4650 depending on whether you need 4, 6, 16 or 32 cores) 4. Read metadata only once, in one of these ways: 4a. Pass all files to a single compiler invocation (per machine or core) 4b. Have a long-lived rustc "daemon" (per machine or core) that keeps crate metadata in memory and gets passed files to compile by fd 4c. Use CRIU suspend/restore (or the unexec from Emacs or whatever) to suspend a rustc process after metadata is read and restore that image for each file instead of spawning a new one 4d. Allocate metadata using a special allocator that allocates it from a block at a fixed memory address, then just dump the block into a file, and read metadata with a single mmap system call at that same fixed address (this is a security hole in general, so it needs to be optional and off by default) ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Segmented stacks (was: IsRustSlimYet (IsRustFastYet v2))
I believe that instead of segmented stacks, the runtime should determine a tight upper bound for stack space for the a task's function, and only allocate a fixed stack of that size, falling back to a large "C-sized" stack if a bound cannot be determined. Such a bound can always be computed if there is no recursion, dynamic dispatch, dynamic allocation on the stack, or foreign C functions. And in fact, it can probably be computed even for foreign C functions, as long as DWARF exception tables are available, or some constraints on machine code are assumed (i.e. that negative offsets are not applied to pointers to the stack stored in memory). This would allow for instance to transform many simple internal iterators into external ones automatically via coroutines, without large memory overhead. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] On tunable Garbage Collection
For Rc<>, it should be enough to have the GC traverse raw pointers that have/don't have a special attribute (probably traversing by default is the best choice), as long as the raw pointers have the correct type. Obviously it only makes sense to traverse types if it is possible for them to point to GC pointers. For data held exclusively by external C libraries (assuming that there is any), an unsafe GcVisitor trait that can be implemented to do the visit should work, along with a NonGc kind for code that doesn't want to implement GcVisitor (otherwise, mark&sweep GC would kill live objects). ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Statically preventing reference-counted cycles while allowing nested rc pointers
> > How about, however, something like (hypotetically) rewriting the Linux > > kernel in Rust? (with limited amounts of unsafe code) > > I think it's a bit unfair that in your example, you insist on wanting > fine-grained global RW locks, when in fact linux has been systematically > backing away from those in favour of RCU. Indeed, the docs on the RW > locks[1] say: >NOTE! We are working hard to remove reader-writer spinlocks in most >cases, so please don't add a new one without consensus. (Instead, >see Documentation/RCU/rcu.txt for complete information.) > > RCU, as you know, is very different, and .. I actually don't know if > it's easy to encode in rust's current rules or not. But I think it's > neither helped by the rules you were proposing either. Well, in many cases RCU is just used on data structures, so a special RcuList, RcuHashTable, etc. implemented with unsafe code would probably do the job. > I think kernels are dealing with way more complication than we can > encode into any set of rules: weird IO memory mappings and barrier > rules[2], per-CPU structures, interrupt safety, custom allocators and > lifetime patterns, all sorts of stuff that we can't handle "optimally". > I would expect a kernel written in rust to involve a lot of looking at > our type system for tools-that-might-help, but ultimately when facing a > performance-vs-safety trade, or a hardware-reality-vs-rust-types trade, > to go with performance and reality, and larger blocks of supporting > "unsafe" code they've reasoned carefully about. That's what they do now > and, somehow, they manage to code with a level of attention-to-detail > and minimalism wherein it seems to mostly work. Yes, although most of those specific issues can probably be isolated in snippets of unsafe code. However, it's enough to consider the kernel as an implementation of POSIX file system calls using a RAM disk to have a broader issue: the natural and simplest way to express it in current Rust is a "filesystem task" accessing task-local data implemented with garbage collected @ boxes, but that would have equivalent serialization and thus performance to a big kernel lock. Otherwise, one needs multiple tasks concurrently mutating shared objects, which requires either global GC or global reference counting that needs both the ability to nest mutable pointers in ways that are statically acyclic due to their types, and probably some sort of mutable RcTree class that would allow reparenting with O(tree_height) checks of acyclicity (like Linux VFS does manually for rename on directories). BTW, there is in fact a probably better way than what I originally proposed to do more general acyclic reference counting: 1. Add a DoesntStronglyPointTo kind (with a shorter name) that would allow to declare RcMut as RcMut>>, and would have the compiler check that there is no sequence of non-weak-reference pointers going from T to RcMut 2. Enhance RcMut to RcMut>, that would hold a (T, U) pair, allow to get a &(T, U) pointer while reading, but only a (&T, &mut U) pointer pair when writing, making T immutable and U mutable 3. Add RcList, RcTree for those structures (and RWARC versions) Then, one would put the cyclic pointers in the T part, which would be OK since they are immutable, and put everything acyclic in the U part with the DoesntStronglyPointTo bound checking that it's indeed acyclic. This avoids the magic of having the compiler infer which pointers can be allowed to mutate, instead having the user indicate them and the compiler give an error if the user gets it wrong. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Statically preventing reference-counted cycles while allowing nested rc pointers
> Restructuring your code to avoid cycles is problematic when you're > implementing a platform where the spec allows users to create ownership > cycles --- like, say, the Web platform. So if > Rust didn't support cyclic ownership, Servo would have to implement its own > GC and tracing code just like Gecko does. That's a situation we need to get > away from. Yes, my idea was to essentially extend the compiler with facilities allowing to implement RcMut and RWARC in such a way that they can be nested without creating memory leaks (subject to some restrictions, but less restrictions than the blanket inability to nest), with the goal of replacing C++ (or Java, perhaps) in high-performance multithreaded servers and kernels. That would be orthogonal to the issue of what to do with @-pointers and whether to have garbage collection (which is indeed very desirable since some programs need it). > An interesting meta-point is that when you're implementing a platform, rather > than a specific application, you have much less freedom to structure the > implementation because you don't > know what any given instance of your system actually does. This affects > everything from the requirements on the programming language to the design of > graphics APIs. Yes, although note that while task-local GC is sufficient to implement a browser with JavaScript (since JavaScript is single threaded), it would not be enough to implement an high-performance Java VM, which requires GC on memory simultaneously accessible by multiple threads. In general, it would be nice to have great support for all 4 models: sophisticated local and global reference counting, and local and global GC (as well as owned and borrowed pointers). It's probably reasonable though for Mozilla to mostly focus on the browser use case. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Statically preventing reference-counted cycles while allowing nested rc pointers
> Every > single reviewer I showed a "no cycles" variant of rust to told me > it was unacceptable and they would walk away from a language that > prohibited cycles in all cases. All of them. We tried limiting it > within subsets of the type system rather than pervasively: it still > irritated and confused everyone. Interesting, are these versions online anywhere? > - It doesn't do anything about fragmentation. Our longer-term plan for > the gc involves moving to mostly-copying[1], which compacts pages > other than the conservatively-scanned set. And is incremental. True, in fact fragmentation might be as bad or worse as the dead objects living on in GC. > > However, it cannot support cycles (without destroying its advantages > > over GC), so it is a good idea to statically prevent them to avoid leaks. > > We tried this; various strategies for it. Ownership cycles even just > within subtrees of well-organized ownership trees are not a rare > occurrence. They are a result both of normal patterns within common data > structures, and normal levels of coupling between subsystems in large > applictions. Especially web browsers. How about, however, something like (hypotetically) rewriting the Linux kernel in Rust? (with limited amounts of unsafe code) The issue there is that a kernel needs to provide syscalls to multiple CPU concurrently running threads which manipulate shared data structures like the filesystem. Having the filesystem structures be local to a single task would destroy performance on things like filesystem-bound loads on 32-core servers, while adding global GC would make it impossible to provide soft and hard real time guarantees needed for embedded systems. So using a MutexARC/RWARC global reference counting scheme with fine grained locks like the Linux kernel in fact does (along with RCU, but let's ignore that for now) seems the only option, but it's currently not safely possible in the current Rust, due to the inability to nest those structures safely with checking. Now it's not totally clear whether full checking of reference cycles is indeed viable, but it seems the only way to achieve that, and it seems bothersome that one could not write a fast and safe real-time POSIX-implementing kernel in a language that is pretty much the only mainstream language that can hope to do so. Even excluding kernels, it seems network servers running on multi-socket machines and needing to concurrently mutate complex object graphs would have similar issues. Although for those real-time might not be so essential, so adding optional concurrent global GC (like Java has) would be an alternative (but is also quite a bit of work). ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Statically preventing reference-counted cycles while allowing nested rc pointers
> This would be a big step away from the advantages of Rust's current > trait system. Right now, if the definition of a generic function type > checks, it's valid for all possible types implementing the trait > bounds. There are no hidden or implicit requirements. Yes, but since Rust, like C++, instantiates generic functions at specialization time, this shouldn't be an issue. It just means that, like with C++ templates, compilation errors concerning the template code can happen at instantiation time. In general, this is unavoidable with a mutable reference counting scheme, because if you have something like this: struct A { v: T } impl A { fn set_v(&mut self, v: T) { self.v = v; } } Then you can allow to call set_v when T is u32, but not when T contains an rc pointer back to A because that could create a cycle for some values of v. The alternative is to just disallow programs that have cycles of rc pointers where at least one of them is mutable, which is less powerful. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
[rust-dev] Statically preventing reference-counted cycles while allowing nested rc pointers
Reference counting is generally more desirable than garbage collection, since it is simple and deterministic, and avoids scanning the whole heap of the program, which causes pauses, destroys caches, prevents effective swapping and requires to tolerate increasing memory usage by a multiplicative factor to allow GC to be amortized to be linear in the number of dead objects in the general case. However, it cannot support cycles (without destroying its advantages over GC), so it is a good idea to statically prevent them to avoid leaks. Currently Rust has several types of reference-counted pointers, with different strategies to do that: - The @ and @mut pointer, which seem to allow cycles and cause them to leak until task termination, although it's sort of supposed to use a GC instead eventually or be removed. - The MutexARC type, which allows cycles but requires unsafe code - The Rc and ARC types, that avoid cycles by not allowing mutations - The RcMut and RWARC type, that avoid cycles by not allowing rc pointers in the data The issue is that none of this is optimal, since for instance it is not possible to have an RcMut where A contains an RcMut but B contains no pointers, even though it is perfectly safe to do so, and in general reference counted pointers cannot be created for some types like the A in this example. However, there seems to be a good solution: the following rule extends both the "no mutations" and "no rc pointers" cases while allowing the above, and allows creating rc pointers to any type at the expense of forbidding some mutations: *** If there is a cycle of pointers (of any kind, including raw, owned and borrowed, excluding raw pointers annotated as weak references) or enums containing at least one reference-counted pointer, forbid mutating all those pointers (even with an &mut) unless the object being assigned is the result of an expression that doesn't include pointers to previously-created objects (and thus cannot point to the pointer), or the value containing the pointer is on the stack or owned by a stack slot (and thus is not in an rc box or owned by it, and so cannot be pointed to by the assigned value) *** Here a cycle means a cyclic sequence of structure fields such that each can point to an instance of the structure of the next field in the sequence. So for instance this case would be allowed: struct A {b: Rc} struct B {a: Option>} but the A.a and B.b fields would be immutable even with &mut A or &mut B pointers, and would only be mutable if the A or B structure is on the stack or owned from the stack (and thus has no incoming rc pointers). On the other hand, this would be allowed with no restrictions since the points-to relationship of the types is acyclic: struct A {b: RcMut) struct B {c: Rc} struct C {x: u32} To implement this, one would need to extend the language with an annotation to tell the compiler that the raw pointers inside of Rc/RcMut/ARC/RWARC/etc. are to be treated as "reference-counted" pointers. The compiler can then perform the static analysis required (do an SCC decomposition of a graph containing types, enum members and pointer fields, with edges from types to contained types, enum members and pointer fields, from pointer fields to the pointed type, from traits to all possible implementing types, from enum to enum member, and then forbid modification to pointers in any SCC with at least one reference-counted pointer) There are at least two complexities: public trait pointers needs to be assumed to be able to point to anything (unless one can see the whole program), while generic types will need to be analyzed in their specialized versions, which means that some methods would be valid to call only for some values of generic type parameters, and that the analysis needs to be done from scratch globally for every module compilation. In addition to all this, weak versions of all rc pointers should be supported (to allow weak "parent" references, for instance), which would require a further annotation telling the compiler that the contained unsafe pointer is "weak" and thus does not participate in the pointer cycle analysis. Also, a keyword can be provided to allow the forbidden mutations at the cost of doing a full run-time scan of the assigned value to ensure it doesn't point to the object with the field being assigned, and failing or returning an error if the check fails (this probably needs to be explicit since it can of course be very expensive to do the scan) Some structures like mutable strong doubly-linked lists would not be allowed by this and would instead require an ad-hoc implementation with raw pointers (where the list node smart pointer will increase the reference count on both the node, and the whole list, and splicing will be O(n)). This infrastructure should be able to handle most programs without using garbage collection, except those that are interpreters of garbage-collected language
Re: [rust-dev] The future of iterators in Rust
Scala has a similar design, with the following traits: - TraversableOnce: can be internally iterated once (has a foreach() method that takes a closure) - Traversable: can be internally iterated unlimited times (has a foreach() method that takes a closure) - Iterable: can be externally iterated (has an iterator() method that returns an Iterator trait) The way it works is that Iterable extends Traversable, which extends TraversableOnce, and the for loop just uses TraversableOnce, and Iterable has a default implementation of the TraversableOnce foreach() function using the iterator() function. Also, the Iterator trait itself extends TraversableOnce, implementing foreach() by mutating itself. It might be a good idea to investigate copying this design. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Adding exception handling as syntax sugar with declared exceptions
> Date: Thu, 16 May 2013 10:58:28 -0700 > From: gray...@mozilla.com > To: bill_my...@outlook.com > CC: rust-dev@mozilla.org > Subject: Re: [rust-dev] Adding exception handling as syntax sugar with > declared exceptions > > On 12/05/2013 8:00 PM, Bill Myers wrote: > > This is a suggestion for adding an exception system to Rust that > > satisfies these requirements: > > 1. Unwinding does NOT pass through code that is not either in a function > > that declares throwing exceptions or in a try block (instead, that > > triggers task failure) > > 2. Callers can ignore the fact that a function throws an exception, > > resulting in task failure > > 3. Callers can handle exceptions if desired, with the same syntax of > > C++, Java and C# > > > > (1) avoids the issue with C++ exception handling forcing to think about > > exception safety everywhere > > (2) avoids the issue with Java's checked exceptions forcing to > > explicitly handle or redeclare them > > (3) provides an easy-to-use exception mechanism > > Hi, sorry, I did mean to get back to this. > > I think this is ... unlikely to work out, or be something we incorporate > into our repo. For a few reasons: > >- It's very late in the design cycle to be making changes of > this scope. We're trying to stabilize. > >- It'll be a big performance hit on code paths that want to use > such exceptions. Things that look like "just function calls" > turn into quite extensive operations. We're already struggling > with the costs of a return-bool solution for unwinding on > platforms that have difficult "normal" EH (#4342). > >- Most seriously: it doesn't actually resolve (1) or (2), imo. > > (1) You'd still have to declare checked exceptions everywhere. But > worse than in java, if you failed to declare them _anywhere_ > you wouldn't get a compilation error, but a runtime failure. > This is like making C++ default to 'throw ()', which tends > to surprise everyone. Well, the idea is that exception would generally either be caught in the immediate caller or let go to task failure, with rare cases of propagation. The target use case are things like "str_to_int" functions that would throw format exceptions that would be caught by the immediate caller in case they want to handle it. > (2) You still have to write in "worry about exceptions everywhere" > style inside a try-block or function-that-rethrows. Only > get to avoid it when you're insulating yourself by the implicit > "throw ()" declaration. Yes, but you can't always avoid this anyway, since in some cases operations are truly irreversible. For instance, if you want to make three HTTP POST requests and the second fails, then the first will have gone through and cannot be reverted automatically in any way, so the logic to handle "we did something and then failed, leaving us in a half-done way" is fundamentally necessary in some case regardless of language design. A limited version of this might be implementable by macros that could automatically generate a "try_" version of a function with Result<> return type along with a version that fails. The only other approach that can improve on this is to have support for reversible execution. Basically, one would compile everything in two versions, where one of the versions keeps an undo log of memory writes. When a try block is entered, the compiler switches to the undo-log-enabled code, and when an exception is thrown, the log is played to undo everything. Hardware transactional memory like Haswell TSX can also be tried first if available. Of course, this wouldn't be able to handle external function calls and modifications of shared memory via unsafe pointers: this could be handled by having a special "unsafe" construct that passes a closure to unwind the changes. Calling irreversible functions like I/O would need to cause a transaction abort/throw an exception. Also, some things like RWARC modifications would be impossible to handle without a full STM implementation (with MVCC and so on). The biggest drawback of this is that it will double code size and compilation time. I'm not sure if it's worthwhile since it cannot work in general due to I/O, and simple cases can be handled with the syntax sugar proposed or doing the same by hand with Result return types and so on. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
[rust-dev] Adding exception handling as syntax sugar with declared exceptions
This is a suggestion for adding an exception system to Rust that satisfies these requirements: 1. Unwinding does NOT pass through code that is not either in a function that declares throwing exceptions or in a try block (instead, that triggers task failure) 2. Callers can ignore the fact that a function throws an exception, resulting in task failure 3. Callers can handle exceptions if desired, with the same syntax of C++, Java and C# (1) avoids the issue with C++ exception handling forcing to think about exception safety everywhere (2) avoids the issue with Java's checked exceptions forcing to explicitly handle or redeclare them (3) provides an easy-to-use exception mechanism The specification is based on adding some syntax sugar, which is then transformed by the compiler into the current Rust syntax, and thus does not really alter runtime behavior (but it is also possible to implement this with traditional unwinding if desired). Functions can be declared to throw exceptions, and if the conceptual unwinding process would have to pass through a function call that does not declare throwing an exception of that type, task failure is invoked instead. To integrate with the condition system, one can raise a condition instead of throwing, and declare the condition type and handlers with "throws", allowing them to throw exceptions if desired (note that of course this will just result in task failure unless the function raising the condition is declared to throw). It is of course possible to code using this style "by hand", but the syntax sugar allows to do it in a much terser way, and allows to convert functions that use task failure to functions that throw exceptions without changing the source code of callers that don't want to handle them. * New syntax added: - "throws E1, E2, ..." modifier on functions and closures - try/catch block - try() expression - throw e statement * New type declarations added: enum Result1 { Ok(T), Fail(E1) } enum Result2 { Ok(T), Fail1(E1), Fail2(E2) } ... and so on... * Transformations to implement the system: Everywhere: fn foo() -> T throws E1 [similar with closures] => fn foo() -> Result1 fn foo() -> T throws E1, E2 [similar with closures] => fn foo() -> Result2 try(f(args)) [if f declared with throws] => f(args) f(args) [if f returns T and is declared with throws, not inside try()] => match f(args) {Ok(x) => x, Fail1(e) => throw e, Fail2(e) => throw e, ...} In functions declared throws: return v => return Ok(v) try {try_body} catch(e1: E1) {catch_body} catch(e2: E2) {catch_body} => let mut e1_: Option = None; let mut e2_: Option = None; 'try_: loop {try_body; break;} if e1_.is_some() { let e1 = e1_.unwrap(); catch1_body } else if e2_.is_some() { let e2 = e2_.unwrap(); catch2_body } throw e, if there is any containing try block that has a catch block with argument eK with type EK such that e is convertible to EK (taking the innermost suitable try block and the first suitable catch block) => eK_ = Some(e); break 'try_ throw e, if there are no suitable try blocks but the function is declared throws and e is convertible to throws type EK => return FailK(e) throw e, otherwise, if e implements ToStr: => fail!(e.to_str()) throw e, otherwise: => fail!() ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev