Re: [rust-dev] Cryptol, the language of cryptography

2014-04-27 Thread Bill Myers
There is nothing hard about it, assuming you are using a decent language.

Just add a Crypto type that wraps integers and booleans and that doesn't 
allow any non-constant time operations nor implicit conversion to anything that 
is not Crypto (which of course means you can't index memory or do branches 
based on it).

If the optimizer can screw up things, implement the Crypto operations in 
assembly.

  
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Removing ~"foo"

2014-04-24 Thread Bill Myers
> Something like this will work, yes. It'll probably look more like:
> 
> Box::new(*x)
> 
> This will be described in some of the RFCs that are coming up soon.

Awesome!

We should really get rid of the ~T syntax in favor of Foo (where Foo = Box, 
Own, Heap, etc.), since it is deceptively simple for something that should in 
fact be rarely used.

  
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Optimizing pattern bindings to not copy anything at runtime

2014-04-24 Thread Bill Myers
Nice, but why isn't the LLVM optimizer removing the move?
Is it lack of proper alias analysis?
Sounds like that is a separate issue worth pursuing.

> The remaining, more difficult, issue is initialization of 
> aggregate data structures via constructor functions, which still 
> involves a bunch of moves, but I don't really see any way short of 
> macros to optimize that at the moment.

Wouldn't #[inline(always)] on the aggregate constructor fix that? (it's not 
great, but it's strictly better than a macro)

At any rate, shouldn't LLVM inlining heuristics catch that automatically as 
well?
If not, that sounds like another separate issue.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-02 Thread Bill Myers

 
> At the moment, Rust is completely broken in this regard. The following
> expression evaluates to None:
> Some(~())

Ouch, this is a disaster.

Is there a bug filed for this?

Anyway, I don't get your argument about size to free having anything to do with 
fixing it (although I agree that size to free is awesome).

If you don't care about equality (i.e. the fact that &*~() != &*~(), but a == a 
where a = &*~()), just return the address of a single private static 1-byte 
item for any 0-sized allocation.

If you DO care about equality, then you will need at least an integer 
allocation scheme in all cases on 32-bit platforms, and the real costs are the 
data structures to track that (at least a bit in a bitmask, probably at least 2 
bits for an efficient implementation).
If you can't use the 1-2GB of kernel address space, then you'll also need to 
allocate one byte of actual usable address space (but not committed memory).

On 64-bit platforms, you generally have at least around 2^60-2^63 bytes of 
unusable address space, so you can just increment a pointer pointing there for 
each allocation, at zero cost.

Of course the quick and simple fix is to try to call malloc(0) and if it 
returns NULL, remember that and switch to using malloc(1) instead.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] matching on a few bits in int

2014-03-29 Thread Bill Myers
I think the best solution is to add uN and sN types where N is not a power of 
two, which LLVM should already support.

Then you can write your match like this:
match (val >> 6) as u2
{
  ...
}

And it will work as desired.

Biggest issue is that to make it work nicely you'd need to add some way to 
generalize over the bit-length and integers, and that's going to require 
generics with int parameters and work to add those.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] "Virtual fn" is a bad idea

2014-03-12 Thread Bill Myers
> However, the extensibility of trait objects comes at the cost of fat
> pointers, which can be a problem if you have a lot of pointers.

This is fixable without introducing virtual functions, by adding a way to 
express "Struct and vtable for impl Trait for Struct" and "thin pointer to 
Struct and vtable for impl Trait for Struct" (or by using an indirect pointer, 
i.e. ~~Trait or Rc<~Trait>).

> It also implies that downcasting isn't really possible, at least not
> cheaply, because there is no notion of the "actual type" of a
> particular object.

I don't understand this.

You can downcast trait pointers to struct pointers just fine, since you can put 
whatever information needed in the trait impl vtable (such as a typeid).

Sure, you can't "downcast" a struct to a struct, but that's the whole point of 
structs.

The idea of a struct is that one is referring to something of EXACTLY that 
type, and so downcasting makes no sense; when one wants a "variable" type one 
uses a trait and then you can (attempt to) downcast to concrete structs.

One can of course introduce "virtual struct"s which are no longer exactly of 
that type but are now virtual, but that just overlaps the role of traits.
> (As an aside, I am personally happier to see virtual structs than to> see 
> traits extending structs or traits extending a `HasPrefix` trait,> as was 
> included in previous designs. Both of those approaches mean> that the trait 
> is no longer "pure interface", and if you write> "generic" code against that 
> trait, you're actually coding at least> partially against a fixed 
> implementation, not an interface.)

I think traits inheriting structs is a better design, because you can have 
multiple traits inheriting the same struct.

For example, let's say you are modelling GUI widgets which need both input 
handling and drawing support.

With traits inheriting structs, you can have several traits, one for input 
handling, 
one for drawing in memory, one for drawing with OpenGL, etc., while with 
virtual functions (and without multiple inheritance) you have to put everything 
together and you have to either implement 
all functionality or add "not supported" error codes.

In other words, a trait inheriting a struct neatly separates the trait part 
where multiple inheritance is natural from the struct part where single 
inheritance is natural.
Also, a trait inheriting a Struct behaves to users of the trait exactly like a 
trait with a get_struct() accessor method, but with better performance.

It's only different for implementors, where it mandates how get_struct() is 
implemented for the sake of performance, at the cost of making it impossible to 
implement it along with traits inheriting from incompatible structs.

In general I think it's better to have multiple options at a low level like 
this, rather than having multiple option at an high-level semantic level like 
virtual struct vs trait, since that exposes the choice less to other modules.   

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] "Virtual fn" is a bad idea

2014-03-11 Thread Bill Myers
I see a proposal to add "virtual struct" and "virtual fn" in the workweek 
meeting notes, which appears to add an exact copy of Java's OO system to Rust.

I think however that this should be carefully considered, and preferably not 
added at all (or failing that, feature gated and discouraged).

The core problem of "virtual functions" (shared by Java's classes, etc.) is 
that rather than exposing a single public API, they expose two: the API formed 
by public functions, and the API formed by virtual functions to be overridden 
by subclasses, and the second API is exposed in an inflexible and unclean way.

A much better way of allowing to override part of a struct's behavior is by 
defining a trait with the overridable functionality, and allowing to pass in an 
implementation of the trait to the base class, while also providing a default 
implementation if desired.

Another way is to have the "subclass" implement all the traits that the "base 
class" implements, include a field of the "base class" type, and then direct 
all non-overridden functionality to the "base class" (here syntax sugar can be 
easily added to eliminate the boilerplate, by automatically implementing all 
non-implemented trait functions by calling the same function on the base class 
field).

These approaches can be combined, as the first approach allows to change the 
"inside" behavior of the base class, while the second one allows to put extra 
behavior "around" the base class code.

The fact that OO using virtual functions (as opposed to traits) is a bad design 
is one of the crucial steps forward of the design of languages like Go and 
current Rust compared to earlier OO languages, and Rust should not go backwards 
on this.

Here is a list of issues with virtual functions:

1. Incentive for bad documentation

Usually there is no documentation for how virtual functions are supposed to be 
overridden, and it as awkward to add it since it needs to be mixed with the 
documentation on how to use the struct

2. Mishmashing multiple unrelated APIs

With traits, you could pass in multiple objects to implement separate sets of 
overridable functionality; with virtual structs you need to mishmash all those 
interfaces into a single set of virtual functions, all sharing data even when 
not appropriate.

3. No encapsulation

Private data for virtual function implementations is accessible to all other 
functions in the struct.

This means for instance that if you have a virtual function called 
"compute_foo()" that is implemented by default by reading a "foo" field in the 
base class, then all other parts of the base class can access "foo" too.

If anything else accesses mistakenly "foo" directly, which it can, then 
overriding "compute_foo()" will not work as expected.

If compute_foo() were provided by an external trait implementation, then "foo" 
would be private and inaccessible, eliminating the problem.

4. Data for overridden implementations left there in a "zombie" state.

In the above example, if you override "compute_foo()", the foo variable in the 
base class will no longer be used, yet it will still be present in the type and 
allocated in memory.

5. Inability to statically dispatch

With a trait implementation, you can pass the concrete type as a generic 
parameter, allowing static dispatch.

If you instead call an overridable virtual function, then you can't dispatch 
that statically at all (unless you add cumbersome syntax for that).

6. Adds a ton of unnecessary complexity to the language 
  
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Will smart-pointer deref operator allow making iter::ByRef more generic?

2014-02-03 Thread Bill Myers
I don't think so, because the fact that the particular instance of T implements 
the Deref trait cannot have any effect on the decorator code, since it's not in 
the bounds for T.

What instead would work is to change the language so that if type Type 
implements Trait and all Trait methods take &self or &mut self (as opposed to 
by value self or ~self), then an implementation of Trait for &'a mut Type is 
automatically generated (with the obvious implementation).

Likewise if all Trait methods take &self, then an implementation of Trait for 
&'a Type is also automatically generated.

Then what you want to do will just work without the need of any wrapper or 
special syntax.

One could then, as an additional step, automatically generate an implementation 
of Trait for MutDeref if Trait is implemented by &mut Type (possibly due 
to the above technique), but this would not be required for the example.
   
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Handling I/O errors

2014-02-03 Thread Bill Myers
> it is guaranteed to happen on all readers

I meant all finite readers, such as those for normal disk files.
  
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Handling I/O errors

2014-02-03 Thread Bill Myers
Are we sure that this is the correct design, as opposed to having read return 0 
for EoF or perhaps returning None with a Result, IoError> return 
type?

After all, EOF is unlike all other I/O errors, since it is guaranteed to happen 
on all readers, and the fact that it needs to be special cased might be an 
indication that the design is wrong.

Also, in addition to "raw" eof, a partial read due to end of file causes the 
same issues, so it seems that code needs to have two match conditions to handle 
eof, while it would only need a single one if eof was represented by Ok(0). 
 
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Idea: Memcpyed stacks for green tasks

2014-01-25 Thread Bill Myers
The ratio of native threads to stacks and of stacks to tasks can actually be 
used to characterize all systems discussed.

(stacks/thread, tasks/stacks)
(1, 1) => current Rust native tasks
(1, N) => Python greenlets
(N, 1) => current Rust green tasks
(N, M) => proposal in my original mail
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Idea: Memcpyed stacks for green tasks

2014-01-25 Thread Bill Myers
Interesting: my proposal appears to be indeed a generalization of the greenlet 
approach.

Specifically, while the greenlet proposal seems to only use one large stack per 
native thread, I'm suggesting to use multiple large stacks that can be stolen 
by other threads, which does mitigate the issues stemming from non-position 
dependent stacks (they are not eliminated, but they have low probability of 
happening).

It's also indeed possible to fully eliminate those issues by autoboxing 
everything whose address is taken, but that would have a potentially large 
performance impact on non-blocking code, while merely copying the stack only 
imposes a performance penalty to code that blocks.  
  
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Idea: Memcpyed stacks for green tasks

2014-01-25 Thread Bill Myers
> I assume this is incompatible with work stealing and task migration
> between threads?

It is compatible, assuming that the number of large stacks is sufficiently 
larger than the number of threads.

Basically, each green task can only run on a specific large stack, but as long 
as you aren't unlucky that all runnable green tasks you'd steal are running on 
a large stack that has already a running green task, then you can steal one.

The probability of that obviously decreases the more large stacks you use; 
large stacks can only consume address space (if you wipe them with 
madvise(MADV_DONTNEED) or similar when unused), so one can reasonably have 256 
8MB large stacks on 32-bit, for instance.

So for instance, if you had an HT 4-core machine with 8 system threads running 
green tasks and 1024 large stacks, the probability of a green task that is not 
blocked but not executable (because its large stack is in use) is 7/256; 
obviously if K tasks are not blocked, then the probability of having none 
executable becomes (7/256)^K (i.e. exponentially lower).

On 64-bit, the probability can be made arbitrarily low, although you consume 
more kernel memory to store the metadata associated with the memory mappings 
for the large stacks.

It would require to rearchitecture the work stealing code to work with a 
two-level data structure and first steal a large stack with at least one 
non-blocked task, and then run the task, rather than directly stealing the 
task.
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] Idea: Memcpyed stacks for green tasks

2014-01-25 Thread Bill Myers
Stack management for green tasks has been based in the past first on segmented 
stacks and then on standard large stacks.

However, I just realized that there is a third alternative which might well be 
better than both of those.

The idea is very simple: a green task would run on a large stack like now, but 
when it is descheduled, a memory buffer of the size of its used stack space is 
allocated, and its stack data is copied there; when it is rescheduled, the 
stack data is copied back to the original address.

The copying can be performed either by memcpy (perhaps with a specialized 
memcpy that assumes alignment and always uses SIMD instruction to copy with no 
checks), or perhaps by using a compression algorithm designed for maximum 
speed, such as Snappy or maybe a new ad-hoc design.

This allows to have the best possible memory efficiency and thus the maximum 
possible number of tasks, even better than segmented stacks due to precise 
sizing and optional compression, while not adding any overhead to code that 
does not block, either in terms of code generation complexity or runtime cost.

There is of course an additional runtime cost when blocking, but blocking 
already often involves both system calls and context switching, and workloads 
with lots of blocking green tasks are probably I/O bound anyway; furthermore, 
stack data is probably in the CPU cache, so there shouldn't be much penalty.

The only significant drawback is that two green tasks running from the 
same "large" stack (note that stacks cannot be easily switched, as that 
would invalidate borrowed pointers on the stack to the stack) cannot 
ever run concurrently; however, by making the number of large stacks a 
large enough multiple of the number of CPU cores, the probability of 
reduced parallelism due to this issue can be made as small as desired.

In general, one would use an adaptive approach, where this kind of copying 
would start to kick in only once the number of green tasks becomes large; when 
not copying, one would just optionally use madvise(MADV_DONTNEED) to trim 
unused whole pages of the stack.

Also, if the data to be copied is large (several OS pages), one may use 
mremap() or similar facilities to move the address space mappings instead of 
the data itself.

Some unsafe code that assumes that tasks can access each other's stacks will 
break (probably only the new Mutex implementation), but that can be fixed by 
putting the data in the task structure instead of the stack.

Overall, this should allow writing something like a parallel network server or 
web crawler in Rust in a natural blocking style, while being fully confident in 
its scalability, which is something that is not really possible at the moment.

Once this is in place, the best practices for Rust programs would become:
1. Use native tasks for tasks whose number is bounded at compile time and small
2. Use green tasks for tasks whose number is unbounded or large

One could even automate this to some extent by spawning by default a native 
task the first C times a given proc() code address is used to start a task 
(where C = number of CPU cores), and green tasks afterwards.

What do you think?
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Ephemeral byte arrays for cryptographic keys/plaintexts

2014-01-10 Thread Bill Myers
At any rate, note that what you are trying to do only provides some mitigation 
and is far from a complete solution, because in practice you can't prevent 
leakage of all confidential data in this way (what about hibernation while the 
key is in memory? what about plaintext decrypted with the key?)

The only effective solution is to encrypt all storage including swap using 
full-disk encryption, as well as all internal network links using IPsec or 
similar, so that it doesn't matter if sensitive data is swapped, accidentally 
written to files or communicated between servers.   
  
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Ephemeral byte arrays for cryptographic keys/plaintexts

2014-01-10 Thread Bill Myers
This can be easily implemented in Rust as a struct doing exactly that.

There's no need to modify the I/O layer, since you'd simply borrow an &[u8] 
from the type and pass it, resulting in the I/O layer directly writing into the 
locked zeroed-on-destruction memory.

As for crypto, it seems the plan is to not implement it in Rust, but
 to bind to libraries such as OpenSSL, libgcrypt, Windows CryptoAPI, etc.

I guess patches would be welcome to implement this. 
  
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] RFC: Generalized monadic notation

2013-12-24 Thread Bill Myers
Some languages support a special "do notation" that allows to express monadic 
operations more naturally.

However, there is an even more powerful option, that I'd call "in notation" (I 
came up with it, but it's obvious, so I'm sure there's some language that has 
something like it).

The idea is that we add an "in" keyword, and in-types, which are the type "in 
T" where T is a monadic type.

In-types are physically represented as the monadic type, but semantically 
behave like the type contained in the monad; they are constructed with the 
expression "in x".

The "out" keyword does the opposite, converting an in-type to the wrapped type.

Operations performed on in types actually act on the value inside of the monad, 
as better specified below

Quick examples:
out(in Some(3) + 6) gives Some(9)
out(in Some(3) + in Some(6)) also gives Some(9)
out(in Some(3) + in None) gives None
out(in ~[1, 2, 3] + 10) gives ~[11, 12, 13]
out(in ~[1, 2, 3] + in ~[10, 20, 30]) gives ~[11, 21, 31, 12, 22, 32, 13, 23, 
33]

Internally, when the compiler encounters any expression including an in-type 
(expressions include control flow constructs), it proceeds like this 
(considering its operands from left to right):
1. Optionally, if the expression is correctly typed (e.g. calling a function 
that takes an in-type), it compiles it normally
2. If the expression does not have an in-type value itself, it converts the 
expression into a closure, and passes it to map()
2. If the expression does have an in-type value itself (for example in x + in y 
has an in-type when viewed as a function of in x, due to the presence of in y), 
it converts out(expression) into a closure, and passes it to flat_map()

Non-movable control flow like return or break is forbidden in expression 
involving in-types (they can be implemented with a flag, but that causes other 
equivalence issues).

The advantage of this is that special do-notation syntax is not required, and 
one can naturally manipulate the values.

The disadvantage is that it adds a layer of "magic", making what the code 
actually does less obvious.

This is particularly true with list-like monads, where simple things like (in 
x) + (in y) actually become a double loop with quadratic run-time; this can be 
optionally avoided by only allowing "in notation" to be used with monads that 
call the closure once.

Also, side effects will be performed a different amount of times depending on 
the monad values, which may also not be obvious.

Note that there are a lot of details that I omitted for the sake of brevity, 
and would need to be handled.
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Safely writing iterators over idiomatic C structures

2013-12-06 Thread Bill Myers
> It can't be defined that way without becoming much less useful for
> more common cases.

Any example of this?

Iterators are most commonly used in for loops, and this change makes no 
difference in this case. Functions that transform iterators also work the same.

Things like collect() don't work for all iterators, but it should be possible 
to make them work for iterators on types that don't have lifetimes.

That is, introducing syntax like this:
fn collect<'a, T:'a, U: Iterator>(iter: U) -> ~['a U] {}

where the 'a bound on T means that the 'x T is convertible to 'a T for any x.   
  
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Safely writing iterators over idiomatic C structures

2013-12-06 Thread Bill Myers
Maybe the language should be changed to allow Iterator to be changed to have a 
signature like this:
pub trait Iterator {
    fn next<'a>(&'a mut self) -> Option<'a A>;
}

Then you could return the &mut by reborrowing and would be able to advance the 
iterator without issue afterwards. 
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] How to reply properly in digest mode?

2013-12-05 Thread Bill Myers
Never use digest mode.

Instead, use normal mode and if necessary add a filter in your e-mail website 
or application to separate all mailing list messages in a specific folder or 
label.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Persistent data structures

2013-12-05 Thread Bill Myers
No, the problem you describe does not exist in my implementation because it 
requires an &mut to the smart pointer.

In particular, if the reference count is 1, then there is no other Rc and Arc 
pointing to the same data, and because we have an &mut there is also no other 
borrowed reference to the Rc or Arc we are manipulating.

Hence, if the reference count is 1 and we have an &mut to the Rc or Arc, it's 
safe to return an &mut pointing to the contents and thus mutate its contents in 
place using it.

If the reference count is more than 1, then it uses read-only access to clone 
the contents, giving a new allocation with reference count 1 that can then be 
mutated in place as above.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Persistent data structures

2013-12-04 Thread Bill Myers
> Did COW improve performance?  What's a good way to do performance 
> testing of Rust code?

The reason I introduced COW when RC > 1 is that it allows persistent data 
structures to be mutated in place if there aren't extra references, just like 
non-persistent data structures.

Lots of languages with persistent data structures can't do that because they 
use garbage collection (and thus they can't tell whether there are other 
references due to lack of a reference count), but it seems an essential feature 
to have if one is using reference counting in a language like Rust that is 
supposed to be a systems language producing optimal code.

> Should I try to get your patches to Rc and Arc merged?  They look 
> generally useful, if folks think they fit the design of Rc.  You also 
> have a patch that adds Own to std which is equivalent to ~T but has 
> methods that look like Rc's.  You use this, and putting most of your 
> treemap.rs inside macro definitions, to get a sort of 
> reference-type-polymorphism in your treemap.  Shall I continue with this 
> approach and try to get Own merged too?  (Opinions from anybody are 
> welcome.)

I did it because I realized that having two different balanced tree 
implementations would have imposed a significant maintenance burden, and thus 
the macro and Own approach would have allowed to avoid that by replacing the 
current treemap code completely with the new optionally-persistent version.

Also, having both Rc and Arc versions seems quite useful anyway, and currently 
I think macros are the best way to have both, until Rust starts supporting 
higher-kinded types (which would allow to pass Rc as a parameter), or at least 
recursive types (which would allow to express Rc instead of ~T and so on; I guess someone on the Rust 
core team will have to decide on that, and there's already some discussion on 
that on https://github.com/mozilla/rust/pull/9786

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Persistent data structures

2013-12-04 Thread Bill Myers



Hello, I already implemented a persistent tree-map called SnapMap: you can find 
the source code at https://github.com/mozilla/rust/pull/9816

I stopped working on it before I made a serious effort to push it into the Rust 
codebase and don't have time to work further on it, so it would be awesome if 
you were interested in continuing that effort.

It's pretty much finished, but pretty much completely untested (however, it was 
derived from the existing treemap, so the amount of bugs should not be as high 
as something written from scratch and untested).

The first thing that needs to be done is to run the existing tests inherited 
from treemap, fix any bug that shows up, and then write more tests to test 
SnapMap specific usage patterns and features.

Then, one should look at the mutable/handle-based iterators in the code and 
decide whether they should be removed, kept as is, or improved, possibly after 
changing the iterator interface to return an object with the lifetime of the 
function call instead of the iterator.

The rest of the code should be fine (except for bugs due to no testing), but 
there might be places where unnecessary copy-on-write is performed.

My code used an improved version of Rc that supports copy-on-write by cloning 
only when reference count > 1.


  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Removing some autoref magic

2013-11-20 Thread Bill Myers
Have you considered making deref the default instead and requiring moves to use 
an explicit "move" keyword?

Basically, from this hypothetical syntax to current one:
- x => &x
- mut x => &mut x
- move x => x

One could even make the & implicit in parameter types in function declarations 
unless the "move" keyword is used, so we would have (new => old syntax):
- fn(self, ...) => fn(&self, ...)
- fn(mut self, ...) => fn(&mut self, ...)
- fn(move self, ...) => fn(self, ...)

Regarding owned pointers, one could introduce this syntax:
let *x = func_that_gives_owned_pointer()

which makes x a variable of type T referring to the contents of the owned box, 
and thus derefed to & when passed to a function as above without having to 
write "*x" in the function call.

One could then add a new "#x" or similar syntax that would get back to the 
owned pointer.

Not sure if all this is a good idea, just throwing it out there. I actually 
think keeping autoref/autoderef is probably better.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] C# async for Rust

2013-11-13 Thread Bill Myers
Although, on second thought, one could just free the unused part of the user 
mode stack whenever a thread blocks, either in the user mode code (i.e. using 
madvise MADV_DONTNEED or equivalent to discard everything below the stack 
pointer modulo the page size, perhaps minus the page size) or automatically in 
a modified kernel, and thus greatly reduce the worst case.

It's still going to have higher overhead than the CPS continuations on the 
heap, because the stack will tend to have holes where dead variables live, for 
aligning to page boundaries, and you also keep around the kernel stack and 
kernel scheduler objects.

And it doesn't work on 32-bit, because you cannot have more than around 2048 
tasks with megabyte-sized task stacks (which is way too low for several 
usages), and unfortunately there are lots of 32-bit-only smartphones and 
tablets that should probably be supported.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] C# async for Rust

2013-11-13 Thread Bill Myers
> The issue with async/await is that while it maps very well to the AIO
> primitives like IOCP and POSIX AIO, it doesn't map well to something
> that's solid on Linux. It's just not how I/O is done on the platform.
> It uses *non-blocking* I/O so scale up socket servers, with
> notification of ready state rather than completion state. This doesn't
> work for file system access though.

Well, I/O would work exactly the same as the current libuv-based Rust tasks 
approach, except that one needs a stack for each CPU thread and not for each 
task, because task creation functions return when a task wants to block, and 
thus there is no stack to save and restore.

On Linux, the epoll interface allows to implement this, and I think it's what 
libuv uses.

The only problem is that Linux doesn't really support asynchronously resolving 
file paths to inodes (aka opening files), but that can be done on a dedicated 
thread pool, with the advantage that the threads don't do anything, so they 
don't have zombie stacks.

The problem with blocking on all threads/the 1:1 thread model is that if you do 
a computation that requires 8MB of stack, and then block with a shallow call 
stack, the 8MB of stack are not freed, so you waste 8MB per TCP connection in a 
TCP server example, which means that on a system with 32GB RAM you can only 
service 4096 TCP connections without swapping to disk instead of millions.

A dedicated thread pool for blocking I/O doesn't have this issue 
because it can run with only 4KB stack or so since it doesn't do 
anything stack intensive, and the number of its threads can be limited 
without user-visible results.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] C# async for Rust

2013-11-13 Thread Bill Myers



> This is similar to IO Monad in Haskell. Adding that to previously pure
> computational code is painful, but on the other hand it does emphasis
> that computational code != IO code and minimizing mixing between two
> typically leads to better design overall.

Yes, but you can have the compiler automatically transform normal code into the 
equivalent of Haskell's "do notation", which is what C# does with async.

Basically code like this:

async fn foo() -> uint {
let a: int = compute_a();
let b: int = compute_b();
let c: char = await read_char();
let r: uint = compute_r(a, b, c);
return r;
}

becomes after the compiler performs CPS transform:

// assume that we have trait Continuation {fn continue(~self, v: T);}
// assume that cps_read_char is provided by the runtime and tells a main loop 
to read a char and then schedule a new task running the passed continuation

struct foo_Continuation(~Continuation, int, int);

fn cps_foo(continuation: ~Continuation) {
let a: int = compute_a();
let b: int = compute_b();
return cps_read_char(~foo_Continuation(continuation, a, b));
}

impl Continuation for foo_Continuation
{
fn continue(~self, c: char) {
let (continuation, a, b) = *self;
let r = compute_r(a, b, c);
continuation.continue(r);
}
}

and then a future-based version can also be generated by the compiler based on 
the CPS transform results:
// assume that Continuation is implemented on TaskFuture and causes the 
TaskFuture to be completed with the value

fn future_foo() -> ~TaskFuture {
let f = ~TaskFuture::new();
cps_foo(f);
return f;
}

Note that if foo() had taken a borrowed pointer with lifetime 'a, then the CPS 
version becomes unsafe, and TaskFuture would also need an 'a parameter.

Also, one can use a borrowed pointer instead of an owned pointer for 
continuations and futures, but then they need to be waited on before its 
lifetime region ends.


  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] C# async for Rust

2013-11-13 Thread Bill Myers
I see several proposals for the future of Rust tasks, and I think one of the 
best approaches is being overlooked, and that is something similar to async in 
C# (http://msdn.microsoft.com/en-us/library/vstudio/hh191443.aspx).

In C#, the "async" keyword can be applied to functions and it causes the 
compiler to transform a function that returns T into a function that returns a 
Task (which is C#'s name for a future of type T) representing the 
potentially asynchronous computation.

Blocking is representing by using the "await" keyword on a future (typically 
returned by a call to another "async" function), and it causes the compiler to 
perform a Continuation-Passing Style transformation, and attach the 
continuation to the future, returning another future representing the composed 
computation.

I/O functions are designed to return futures, so in this system blocking causes 
all calling "async" functions to return, building up a chain of continuations 
on the heap, which is equivalent to the machine stack in a current-Rust task, 
but which is as small as needed, and is only used for call chains that block.

In Rust, this transformation is much more delicate, because the resulting 
return value futures must have a lifetime that is the smallest among all the 
arguments to the function, if those arguments are needed by the continuation, 
and the argument types must be "shareable" between parallel forks (e.g. 
std::rc::Rc is not shareable because RC manipulation is non-atomic).

However, it is possible to restrict the system to use non-delimited 
continuations instead of the delimited continuations and futures, which would 
avoid this problem, since futures cannot be used explicitly anymore (at the 
cost of making flexible parallelism impossible).

In this latter case, it would be equivalent to the current task system, except 
for requiring blocking functions to be marked "async"; the "await" keyword 
would not be required, since it would effectively become compulsory if there 
are no first-class futures returned for async functions.

Advantages:
- Functions and interfaces that can perform I/O or can block are clearly marked 
by a keyword
- Can have billions of blocked tasks (with hundreds of gigabytes of RAM) since 
the memory used by each blocked task is truly minimized because it's on the heap

Disadvantages:
- Requires an heap allocation for every function call to an async function (to 
hold the continuation data)
- Non-"async" functions cannot call "async" functions, so interfaces must be 
explicitly marked as async or not
- Requires to implement the transform in the compiler

Microsoft switched to this paradigm in the latest version of C# and in the 
Windows RT API, and it might be an appropriate choice for Rust too.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Abandoning segmented stacks in Rust

2013-11-04 Thread Bill Myers
The advantage of segmented stacks is that blocked tasks only take up as much 
memory as they actually need to store state, so that for instance a network 
server can use a task for each connection, and still only use, say, 64 bytes 
per connection if that's possible instead of the number of stack pages that got 
allocated for previous computation (assuming an "extreme" version that 
allocates a stack segment on every call).

However, there is another approach that can replace segmented stacks for that 
purpose, namely having the compiler automatically transform blocking functions 
to instead return a future (with limited lifetime).

This means that segmented allocation only happens for functions that indirectly 
perform I/O and only allocates the exact amount of memory needed to retain 
state that must persistent across the blocking I/O operation, while other 
functions execute normally using traditional stacks.

The simplest example of this feature is async/await in C# 5, and Scala has a 
delimited continuation passing transformation that can be used to do the same 
thing.

Has this been considered for Rust?

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] On Stack Safety

2013-10-21 Thread Bill Myers

 > It seems to me that trying to determine max stack size is incompatible with 
 > dynamic linking. So even if you disallow recursion, any function that calls 
 > a function outside of its own crate is not going to be able to trust its 
 > calculated max stack size.

The maximum stack size needs to be computed dynamically, in a "C++ global 
constructor" placed by rustc in all crate compiled libraries/executables that 
would write the computed value in suitable global variables.

The check only needs to be done by task creation routines, by function calls 
contained in a recursion cycle, by closures and by stubs put in ~Trait/@Trait 
vtables (where Trait is public and thus implementable by other crates).

Note that the latter two cannot really be avoided by taking the max over all 
implementations dynamically because a new dynamic library could be loaded 
between the check and the indirect call.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Should I/O use conditions?

2013-10-16 Thread Bill Myers
What about the idea of making Result cause task failure if it is destroyed in 
the error case? (as opposed to destructuring it)

This just needs a simple language change to add an attribute that would be 
applied on Result to declare that it's OK to destructure it and cause drop() to 
not be called.

In addition, maybe some EH syntax sugar like the one I proposed in 
http://www.mail-archive.com/rust-dev@mozilla.org/msg04093.html

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


rust-dev@mozilla.org

2013-09-19 Thread Bill Myers
> Allowing one closure to take &mut while another takes &const would
> create a data race if the two closures are executed in parallel.

Closures executable in parallel would probably have kind bounds forbidding 
&const: 
http://smallcultfollowing.com/babysteps/blog/2013/06/11/data-parallelism-in-rust/
 explicitly mentions this.

BTW, how about keeping it, and calling it "&volatile" instead of "&const", 
since that's what C uses to name something that can be changed outside the 
program's control?

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] x;RE: Structural enums for datasort refinements

2013-08-28 Thread Bill Myers
I was talking about 
http://smallcultfollowing.com/babysteps/blog/2012/08/24/datasort-refinements/, 
which essentially introduces structural enums but only for variants belonging 
to the same named enum.


  > * This strikes me as an extreme change to the language, but
  perhaps my gut is overly conservative.  

  
Well, to some extent: it does indeed cause a fundamental change in type 
inference, since now any finite set of types can be "unified" as a structural 
enum type.

That said, I think this needs to be done anyway in something like Matsakis' 
proposal, since otherwise you can't pass an x declared as "let mut x = A; 
if(foo) {x = B;}" to something taking a Enum[A, B] if Enum also has a C case, 
so it only really changes how frequently the expanded type inference would be 
used.

BTW, it should be allowed to call impl methods, trait methods and access fields 
that are common to A, B, C  on "A | B | C", effectively adding a new "closed" 
polymorphism system that can be used by just switching a types from Trait to A 
| B | C if the only possible Trait implementations in that code are A, B, C, 
with the rest of the code being unchanged.

> You have not addressed in your proposal how you would change the
  match syntax to deal with non-struct variants, such as ~str or
  int.

Scala uses "x: int" to match an int (or type pattern in general) an assign it 
to x, and "x @ " to assign x to something matched by a pattern (since 
Scala distinguishes between "value" patterns and type patterns).

In Rust distinguishing between value and type patterns might not be necessary, 
so one can probably use the same character for both cases.

> Do the tags on the variants need to
  encode the whole type, and not just which struct it is?

They of course need to identify the whole type, and they probably will change 
numerically depending on generic instantiation if some can unify under some 
generic instantiations.

> And what
  about the poor user who didn't think about the fact that they
  might alias each other, and
> thought that all the clauses in the
  code for A | B| S were disjoint, but in fact they
  potentially overlap
> due to the potential aliasing, and thus the
  order of the cases in the above is now crucial, yes?

Well, the poor user's thought process turns out to be incorrect, since any enum 
including a naked type parameter is obviously not disjoint in all cases.


  > -- Another example: Can/should I now just throw seemingly
  unrelated struct's into a match, in order to 
> anticipate that the
  parameters will be instantiated with that struct?  Consider the
  following:

Yes: the compiler can potentially emit an error or warning if the match will 
never match under any  generic instantiation.

  
This also allows a form of internal "partial specialization" of functions, 
where a function can have specialized behavior for some subsets of generic 
instantiations.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Structural enums for datasort refinements

2013-08-28 Thread Bill Myers
The issues in O'Caml seem to be due to the fact that in O'Caml function 
parameter and return types are inferred, and thus "accidentally oversized" enum 
types can propagate through them.

In Rust, they must specified by the user, so those oversized enum types will 
cause an error as they are passed as a parameter or return value with an 
user-specified type that is "smaller" than the inferred type.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] Structural enums for datasort refinements

2013-08-27 Thread Bill Myers
I was reading a proposal about adding "datasort refinements" to make enum 
variants first-class types, and it seems to me there is a simpler and more 
effective way of solving the problem.

The idea is that if A, B and C are types, then "A | B | C" is a "structural" 
enum type that can be either A, B or C.

In addition, A can be implicitly converted to "A | B", "A | B" can be 
implicitly converted to "A | B | C", and also "(A | B) | C" and "A | (B | C)" 
are equivalent to "A | B | C", and finally "C | B | A" is equivalent to "A | B 
| C" (to support the latter, the implementation needs to sort variants in some 
arbitrary total order before assigning tag numbers).

Furthermore, a way to bind variables to an "or" pattern is introduced to allow 
to convert "A | B | C" to "A | B" in the case that it holds an A or a B.

This way, one can rewrite Option as a type alias like this:
struct Some(T);
struct None;

type Option = None | Some;

Which is like the current Option, but also makes None and Some first-class 
types.

The current enum syntax can remain as syntax sugar for the above code.

The only issue I see is what to do for code such as "let mut x = Some(3); x = 
None;": with this proposal, Some and None are separate unrelated types, so we 
either have this code emit an error, or x must be given the type "Some | 
None" automatically, which however can lead to obscure error messages if one 
mistakenly attempts to assign a string to it causing the type to become 
"Some | None | ~str" (i.e. the user might be told than a match is not 
exhaustive because it does not handle the "~str" case, rather than that they 
assigned a ~str to an Option-typed variable).

It should be possible to allow this, and make the error-emitting code use 
heuristics to figure out whether it is more likely that the user assigned a 
value of the wrong type, or used an enum improperly (for example, by looking at 
whether the implicitly created enum type is ever written explicitly in the 
source, and whether the deduced structural enum type is being used in places 
that require a non-enum type).

Alternatively, one can stipulate that only types that are structs, or that are 
structs marked "enum struct" or "case struct" or similar can become part of an 
inferred structural enum, but this seems unappealing.

Note that some structural enums can change representations depending generic 
instantiation, since "T | int" becomes just "int" if T = int, while it is "~str 
| int" if T = ~str (and similar for "Some | Some"), but this should not 
be a problem.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] cycle time, compile/test performance

2013-08-23 Thread Bill Myers
>   - We essentially always do whole-program / link-time optimization
> in C++ terminology. That is, we run a whole crate through LLVM at
> once. Which _would_ be ok (I think!) if we weren't generating
> quite so much code. It is an AOT/runtime trade but one we
> consciously designed-in to the language.

> time: 33.939 s  LLVM passes

Maybe this should be changed to optionally do codegen and LLVM passes in 
parallel, producing an LLVM or native module for each Rust file, and then 
linking the modules together into the compiled crate.

Alternatively, there seems to be some work on running LLVM FunctionPasses in 
parallel at 
http://llvm.1065342.n5.nabble.com/LLVM-Dev-Discussion-Function-based-parallel-LLVM-backend-code-generation-td59384.html
 but it doesn't seem production-ready.

Most large C/C++ projects rely on parallel make and distcc to have reasonable 
build times, and it seems something that Rust needs to support (either via 
make/distcc or internally) to be a viable replacement for large projects.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] cycle time, compile/test performance

2013-08-23 Thread Bill Myers
> > 2. Distribute compilations and tests across a cluster of machines (like
> > distcc)
> >
> 
> Compilation is 99%  serial (the only things that happen in parallel
> are rustpkg and rustdoc etc at the end, and they are almost nothing),
> though tests could be distributed (and Graydon is working on doing
> that afaik).

Are you sure?

I'm not an expert on rustc implementation, but I'd expect you could have a 
sequence of parallel phases and "collection" points: specifically I'd guess you 
could parse all files in parallel, and once you have the AST for all files, do 
checking and typing in parallel, and then do codegen in parallel.

For comparison, C/C++ can do everything except linking in parallel, except for 
parsing header files if not using precompiled headers (which is still parallel 
but repeated for each file); linking can be parallelized to some extent on a 
single machine with gold.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] cycle time, compile/test performance

2013-08-21 Thread Bill Myers
Have you considered the following "non-specific" quick fixes?

1. Build on a ramfs/ramdisk

2. Distribute compilations and tests across a cluster of machines (like distcc)

3. If non-parallelizable code is still the bottleneck, use the 
fastest CPU possible (i.e. an overclocked Core i7 
4770K, overclocked Core i7 4960X/3960X/4930K/3930K, dual Xeon E5 2687W or quad 
Xeon E5 4650 
depending on whether you need 4, 6, 16 or 32 cores)

4. Read metadata only once, in one of these ways:
4a. Pass all files to a single compiler invocation (per machine or core)
4b. Have a long-lived rustc "daemon" (per machine or core) that keeps crate 
metadata in memory and gets passed files to compile by fd
4c. Use CRIU suspend/restore (or the unexec from Emacs or whatever) to suspend 
a rustc process after metadata is read and restore that image for each file 
instead of spawning a new one
4d. Allocate metadata using a special allocator that allocates it from a block 
at a fixed memory address, then just dump the block into a file, and read 
metadata with a single mmap system call at that same fixed address (this is a 
security hole in general, so it needs to be optional and off by default)

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Segmented stacks (was: IsRustSlimYet (IsRustFastYet v2))

2013-07-05 Thread Bill Myers
I believe that instead of segmented stacks, the runtime should determine a 
tight upper bound for stack space for the a task's function, and only allocate 
a fixed stack of that size, falling back to a large "C-sized" stack if a bound 
cannot be determined.

Such a bound can always be computed if there is no recursion, dynamic dispatch, 
dynamic allocation on the stack, or foreign C functions.

And in fact, it can probably be computed even for foreign C functions, as long 
as DWARF exception tables are available, or some constraints on machine code 
are assumed (i.e. that negative offsets are not applied to pointers to the 
stack stored in memory).

This would allow for instance to transform many simple internal iterators into 
external ones automatically via coroutines, without large memory overhead.
  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] On tunable Garbage Collection

2013-06-18 Thread Bill Myers
For Rc<>, it should be enough to have the GC traverse raw pointers
 that have/don't have a special attribute (probably traversing by 
default is the best choice), as long as the raw pointers have the 
correct type.

Obviously it only makes sense to traverse types if it is possible for them to 
point to GC pointers.

For
 data held exclusively by external C libraries (assuming that there is 
any), an unsafe GcVisitor trait that can be implemented to do the visit 
should work, along with a NonGc kind for code that doesn't want to 
implement GcVisitor (otherwise, mark&sweep GC would kill live 
objects).

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Statically preventing reference-counted cycles while allowing nested rc pointers

2013-06-18 Thread Bill Myers
> > How about, however, something like (hypotetically) rewriting the Linux
> > kernel in Rust? (with limited amounts of unsafe code)
> 
> I think it's a bit unfair that in your example, you insist on wanting
> fine-grained global RW locks, when in fact linux has been systematically
> backing away from those in favour of RCU. Indeed, the docs on the RW
> locks[1] say:

>NOTE! We are working hard to remove reader-writer spinlocks in most
>cases, so please don't add a new one without consensus.  (Instead,
>see Documentation/RCU/rcu.txt for complete information.)
> 
> RCU, as you know, is very different, and .. I actually don't know if
> it's easy to encode in rust's current rules or not. But I think it's
> neither helped by the rules you were proposing either.

Well, in many cases RCU is just used on data structures, so a special RcuList, 
RcuHashTable, etc. implemented with unsafe code would probably do the job.

> I think kernels are dealing with way more complication than we can
> encode into any set of rules: weird IO memory mappings and barrier
> rules[2], per-CPU structures, interrupt safety, custom allocators and
> lifetime patterns, all sorts of stuff that we can't handle "optimally".
> I would expect a kernel written in rust to involve a lot of looking at
> our type system for tools-that-might-help, but ultimately when facing a
> performance-vs-safety trade, or a hardware-reality-vs-rust-types trade,
> to go with performance and reality, and larger blocks of supporting
> "unsafe" code they've reasoned carefully about. That's what they do now
> and, somehow, they manage to code with a level of attention-to-detail
> and minimalism wherein it seems to mostly work.

Yes, although most of those specific issues can probably be isolated in 
snippets of unsafe code.

However, it's enough to consider the kernel as an implementation of POSIX file 
system calls using a RAM disk to have a broader issue: the natural and simplest 
way to express it in current Rust is a "filesystem task" accessing task-local 
data implemented with garbage collected @ boxes, but that would have equivalent 
serialization and thus performance to a big kernel lock.

Otherwise, one needs multiple tasks concurrently mutating shared objects, which 
requires either global GC or global reference counting that needs both the 
ability to nest mutable pointers in ways that are statically acyclic due to 
their types, and probably some sort of mutable RcTree class that would allow 
reparenting with O(tree_height) checks of acyclicity (like Linux VFS does 
manually for rename on directories).

BTW, there is in fact a probably better way than what I originally proposed to 
do more general acyclic reference counting:
1. Add a DoesntStronglyPointTo kind (with a shorter name) that would allow 
to declare RcMut as RcMut>>, and would 
have the compiler check that there is no sequence of non-weak-reference 
pointers going from T to RcMut
2. Enhance RcMut to RcMut>, that would hold 
a (T, U) pair, allow to get a &(T, U) pointer while reading, but only a (&T, 
&mut U) pointer pair when writing, making T immutable and U mutable
3. Add RcList, RcTree for those structures (and RWARC versions)

Then, one would put the cyclic pointers in the T part, which would be OK since 
they are immutable, and put everything acyclic in the U part with the 
DoesntStronglyPointTo bound checking that it's indeed acyclic.

This avoids the magic of having the compiler infer which pointers can be 
allowed to mutate, instead having the user indicate them and the compiler give 
an error if the user gets it wrong.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Statically preventing reference-counted cycles while allowing nested rc pointers

2013-06-07 Thread Bill Myers
> Restructuring your code to avoid cycles is problematic when you're 
> implementing a platform where the spec allows users to create ownership 
> cycles --- like, say, the Web platform. So if 
> Rust didn't support cyclic ownership, Servo would have to implement its own 
> GC and tracing code just like Gecko does. That's a situation we need to get 
> away from.

Yes, my idea was to essentially extend the compiler with facilities allowing to 
implement RcMut and RWARC in such a way that they can be nested without 
creating memory leaks (subject to some restrictions, but less restrictions than 
the blanket inability to nest), with the goal of replacing C++ (or Java, 
perhaps) in high-performance multithreaded servers and kernels.

That would be orthogonal to the issue of what to do with @-pointers and whether 
to have garbage collection (which is indeed very desirable since some programs 
need it).


> An interesting meta-point is that when you're implementing a platform, rather 
> than a specific application, you have much less freedom to structure the 
> implementation because you don't 
> know what any given instance of your system actually does. This affects 
> everything from the requirements on the programming language to the design of 
> graphics APIs.

Yes, although note that while task-local GC is sufficient to implement a 
browser with JavaScript (since JavaScript is single threaded), it would not be 
enough to implement an high-performance Java VM, which requires GC on memory 
simultaneously accessible by multiple threads.

In general, it would be nice to have great support for all 4 models: 
sophisticated local and global reference counting, and local and global GC (as 
well as owned and borrowed pointers).

It's probably reasonable though for Mozilla to mostly focus on the browser use 
case.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Statically preventing reference-counted cycles while allowing nested rc pointers

2013-06-07 Thread Bill Myers
> Every
> single reviewer I showed a "no cycles" variant of rust to told me
> it was unacceptable and they would walk away from a language that
> prohibited cycles in all cases. All of them. We tried limiting it
> within subsets of the type system rather than pervasively: it still
> irritated and confused everyone.

Interesting, are these versions online anywhere?

>   - It doesn't do anything about fragmentation. Our longer-term plan for
> the gc involves moving to mostly-copying[1], which compacts pages
> other than the conservatively-scanned set. And is incremental.

True, in fact fragmentation might be as bad or worse as the dead objects living 
on in GC.

> > However, it cannot support cycles (without destroying its advantages
> > over GC), so it is a good idea to statically prevent them to avoid leaks.
> 
> We tried this; various strategies for it. Ownership cycles even just
> within subtrees of well-organized ownership trees are not a rare
> occurrence. They are a result both of normal patterns within common data
> structures, and normal levels of coupling between subsystems in large
> applictions. Especially web browsers.

How about, however, something like (hypotetically) rewriting the Linux kernel 
in Rust? (with limited amounts of unsafe code)

The issue there is that a kernel needs to provide syscalls to multiple CPU 
concurrently running threads which manipulate shared data structures like the 
filesystem.

Having the filesystem structures be local to a single task would destroy 
performance on things like filesystem-bound loads on 32-core servers, while 
adding global GC would make it impossible to provide soft and hard real time 
guarantees needed for embedded systems.

So using a MutexARC/RWARC global reference counting scheme with fine grained 
locks like the Linux kernel in fact does (along with RCU, but let's ignore that 
for now) seems the only option, but it's currently not safely possible in the 
current Rust, due to the inability to nest those structures safely with 
checking.

Now it's not totally clear whether full checking of reference cycles is indeed 
viable, but it seems the only way to achieve that, and it seems bothersome that 
one could not write a fast and safe real-time POSIX-implementing kernel in a 
language that is pretty much the only mainstream language that can hope to do 
so.

Even excluding kernels, it seems network servers running on multi-socket 
machines and needing to concurrently mutate complex object graphs would have 
similar issues.

Although for those real-time might not be so essential, so adding optional 
concurrent global GC (like Java has) would be an alternative (but is also quite 
a bit of work).

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Statically preventing reference-counted cycles while allowing nested rc pointers

2013-06-07 Thread Bill Myers

> This would be a big step away from the advantages of Rust's current
> trait system. Right now, if the definition of a generic function type
> checks, it's valid for all possible types implementing the trait
> bounds. There are no hidden or implicit requirements.

Yes, but since Rust, like C++, instantiates generic functions at specialization 
time, this shouldn't be an issue. It just means that, like with C++ templates, 
compilation errors concerning the template code can happen at instantiation 
time.

In general, this is unavoidable with a mutable reference counting scheme, 
because if you have something like this:
struct A
{
v: T
}

impl A
{
fn set_v(&mut self, v: T)
{
self.v = v;
}
}

Then you can allow to call set_v when T is u32, but not when T contains an rc 
pointer back to A because that could create a cycle for some values of v.

The alternative is to just disallow programs that have cycles of rc pointers 
where at least one of them is mutable, which is less powerful.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] Statically preventing reference-counted cycles while allowing nested rc pointers

2013-06-06 Thread Bill Myers
Reference counting is generally more desirable than garbage collection, since 
it is simple and deterministic, and avoids scanning the whole heap of the 
program, which causes pauses, destroys caches, prevents effective swapping and 
requires to tolerate increasing memory usage by a multiplicative factor to 
allow GC to be amortized to be linear in the number of dead objects in the 
general case.

However, it cannot support cycles (without destroying its advantages over GC), 
so it is a good idea to statically prevent them to avoid leaks.

Currently Rust has several types of reference-counted pointers, with different 
strategies to do that:
- The @ and @mut pointer, which seem to allow cycles and cause them to leak 
until task termination, although it's sort of supposed to use a GC instead 
eventually or be removed.
- The MutexARC type, which allows cycles but requires unsafe code
- The Rc and ARC types, that avoid cycles by not allowing mutations
- The RcMut and RWARC type, that avoid cycles by not allowing rc pointers in 
the data

The issue is that none of this is optimal, since for instance it is not 
possible to have an RcMut where A contains an RcMut but B contains no 
pointers, even though it is perfectly safe to do so, and in general reference 
counted pointers cannot be created for some types like the A in this example.

However, there seems to be a good solution: the following rule extends both the 
"no mutations" and "no rc pointers" cases while allowing the above, and allows 
creating rc pointers to any type at the expense of forbidding some mutations:
*** If there is a cycle of pointers (of any kind, including raw, owned and 
borrowed, excluding raw pointers annotated as weak references) or enums 
containing at least one reference-counted pointer, forbid mutating all those 
pointers (even with an &mut) unless the object being assigned is the result of 
an expression that doesn't include pointers to previously-created objects (and 
thus cannot point to the pointer), or the value containing the pointer is on 
the stack or owned by a stack slot (and thus is not in an rc box or owned by 
it, and so cannot be pointed to by the assigned value) ***

Here a cycle means a cyclic sequence of structure fields such that each can 
point to an instance of the structure of the next field in the sequence.

So for instance this case would be allowed:
struct A {b: Rc}
struct B {a: Option>}

but the A.a and B.b fields would be immutable even with &mut A or &mut B 
pointers, and would only be mutable if the A or B structure is on the stack or 
owned from the stack (and thus has no incoming rc pointers).

On the other hand, this would be allowed with no restrictions since the 
points-to relationship of the types is acyclic:
struct A {b: RcMut)
struct B {c: Rc}
struct C {x: u32}

To implement this, one would need to extend the language with an annotation to 
tell the compiler that the raw pointers inside of Rc/RcMut/ARC/RWARC/etc. are 
to be treated as "reference-counted" pointers.

The compiler can then perform the static analysis required (do an SCC 
decomposition of a graph containing types, enum members and pointer fields, 
with edges from types to contained types, enum members and pointer fields, from 
pointer fields to the pointed type, from traits to all possible implementing 
types, from enum to enum member, and then forbid modification to pointers in 
any SCC with at least one
 reference-counted pointer)

There are at least two complexities: public trait pointers needs to be assumed 
to be able to point to anything (unless one can see the whole program), while 
generic types will need to be analyzed in their specialized versions, which 
means that some methods would be valid to call only for some values of generic 
type parameters, and that the analysis needs to be done from scratch globally 
for every module compilation.

In addition to all this, weak versions of all rc pointers should be supported 
(to allow weak "parent" references, for instance), which would require a 
further annotation telling the compiler that the contained unsafe pointer is 
"weak" and thus does not participate in the pointer cycle analysis.

Also, a keyword can be provided to allow the forbidden mutations at the cost of 
doing a full run-time scan of the assigned value to ensure it doesn't point to 
the object with the field being assigned, and failing or returning an error if 
the check fails (this probably needs to be explicit since it can of course be 
very expensive to do the scan)

Some structures like mutable strong doubly-linked lists would not be allowed by 
this and would instead require an ad-hoc implementation with raw pointers 
(where the list node smart pointer will increase the reference count on both 
the node, and the whole list, and splicing will be O(n)).

This infrastructure should be able to handle most programs without using 
garbage collection, except those that are interpreters of garbage-collected 
language

Re: [rust-dev] The future of iterators in Rust

2013-06-06 Thread Bill Myers
Scala has a similar design, with the following traits:
- TraversableOnce: can be internally iterated once (has a foreach() method that 
takes a closure)
- Traversable: can be internally iterated unlimited times (has a foreach() 
method that takes a closure)
- Iterable: can be externally iterated (has an iterator() method that returns 
an Iterator trait)

The way it works is that Iterable extends Traversable, which extends 
TraversableOnce, and the for loop just uses TraversableOnce, and Iterable has a 
default implementation of the TraversableOnce foreach() function using the 
iterator() function.

Also, the Iterator trait itself extends TraversableOnce, implementing foreach() 
by mutating itself.

It might be a good idea to investigate copying this design.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Adding exception handling as syntax sugar with declared exceptions

2013-06-06 Thread Bill Myers


> Date: Thu, 16 May 2013 10:58:28 -0700
> From: gray...@mozilla.com
> To: bill_my...@outlook.com
> CC: rust-dev@mozilla.org
> Subject: Re: [rust-dev] Adding exception handling as syntax sugar with 
> declared exceptions
> 
> On 12/05/2013 8:00 PM, Bill Myers wrote:
> > This is a suggestion for adding an exception system to Rust that
> > satisfies these requirements:
> > 1. Unwinding does NOT pass through code that is not either in a function
> > that declares throwing exceptions or in a try block (instead, that
> > triggers task failure)
> > 2. Callers can ignore the fact that a function throws an exception,
> > resulting in task failure
> > 3. Callers can handle exceptions if desired, with the same syntax of
> > C++, Java and C#
> >
> > (1) avoids the issue with C++ exception handling forcing to think about
> > exception safety everywhere
> > (2) avoids the issue with Java's checked exceptions forcing to
> > explicitly handle or redeclare them
> > (3) provides an easy-to-use exception mechanism
> 
> Hi, sorry, I did mean to get back to this.
> 
> I think this is ... unlikely to work out, or be something we incorporate 
> into our repo. For a few reasons:
> 
>- It's very late in the design cycle to be making changes of
>  this scope. We're trying to stabilize.
> 
>- It'll be a big performance hit on code paths that want to use
>  such exceptions. Things that look like "just function calls"
>  turn into quite extensive operations. We're already struggling
>  with the costs of a return-bool solution for unwinding on
>  platforms that have difficult "normal" EH (#4342).
> 
>- Most seriously: it doesn't actually resolve (1) or (2), imo.
> 
>  (1) You'd still have to declare checked exceptions everywhere. But
>  worse than in java, if you failed to declare them _anywhere_
>  you wouldn't get a compilation error, but a runtime failure.
>  This is like making C++ default to 'throw ()', which tends
>  to surprise everyone.

Well, the idea is that exception would generally either be caught in the 
immediate caller or let go to task failure, with rare cases of propagation.

The target use case are things like "str_to_int" functions that would throw 
format exceptions that would be caught by the immediate caller in case they 
want to handle it.

>  (2) You still have to write in "worry about exceptions everywhere"
>  style inside a try-block or function-that-rethrows. Only
>  get to avoid it when you're insulating yourself by the implicit
>  "throw ()" declaration.

Yes, but you can't always avoid this anyway, since in some cases operations are 
truly irreversible.

For instance, if you want to make three HTTP POST requests and the second 
fails, then the first will have gone through and cannot be reverted 
automatically in any way, so the logic to handle "we did something and then 
failed, leaving us in a half-done way" is fundamentally necessary in some case 
regardless of language design.

A limited version of this might be implementable by macros that could 
automatically generate a "try_" version of a function with Result<> return type 
along with a version that fails.


The only other approach that can improve on this is to have support for 
reversible execution.

Basically, one would compile everything in two versions, where one of the 
versions keeps an undo log of memory writes. When a try block is entered, the 
compiler switches to the undo-log-enabled code, and when an exception is 
thrown, the log is played to undo everything. Hardware transactional memory 
like Haswell TSX can also be tried first if available.

Of course, this wouldn't be able to handle external function calls and 
modifications of shared memory via unsafe pointers: this could be handled by 
having a special "unsafe" construct that passes a closure to unwind the changes.

Calling irreversible functions like I/O would need to cause a transaction 
abort/throw an exception. Also, some things like RWARC modifications would be 
impossible to handle without a full STM implementation (with MVCC and so on).

The biggest drawback of this is that it will double code size and compilation 
time.

I'm not sure if it's worthwhile since it cannot work in general due to I/O, and 
simple cases can be handled with the syntax sugar proposed or doing the same by 
hand with Result return types and so on.


  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] Adding exception handling as syntax sugar with declared exceptions

2013-05-12 Thread Bill Myers
This is a suggestion for adding an exception system to Rust that satisfies 
these requirements:
1. Unwinding does NOT pass through code that is not either in a function that 
declares throwing exceptions or in a try block (instead, that triggers task 
failure)
2. Callers can ignore the fact that a function throws an exception, resulting 
in task failure
3. Callers can handle exceptions if desired, with the same syntax of C++, Java 
and C#

(1) avoids the issue with C++ exception handling forcing to think about 
exception safety everywhere
(2) avoids the issue with Java's checked exceptions forcing to explicitly 
handle or redeclare them
(3) provides an easy-to-use exception mechanism

The specification is based on adding some syntax sugar, which is then 
transformed by the compiler into the current Rust syntax, and thus does not 
really alter runtime behavior (but it is also possible to implement this with 
traditional unwinding if desired).

Functions can be declared to throw exceptions, and if the conceptual unwinding 
process would have to pass through a function call that does not declare 
throwing an exception of that type, task failure is invoked instead.

To integrate with the condition system, one can raise a condition instead of 
throwing, and declare the condition type and handlers with "throws", allowing 
them to throw exceptions if desired (note that of course this will just result 
in task failure unless the function raising the condition is declared to throw).

It is of course possible to code using this style "by hand", but the syntax 
sugar allows to do it in a much terser way, and allows to convert functions 
that use task failure to functions that throw exceptions without changing the 
source code of callers that don't want to handle them.

* New syntax added:

- "throws E1, E2, ..." modifier on functions and closures
- try/catch block
- try() expression
- throw e statement

* New type declarations added:
enum Result1
{
  Ok(T),
  Fail(E1)
}

enum Result2
{
  Ok(T),
  Fail1(E1),
  Fail2(E2)
}

... and so on...

* Transformations to implement the system:

Everywhere:
fn foo() -> T throws E1 [similar with closures] => fn foo() -> Result1
fn foo() -> T throws E1, E2 [similar with closures] => fn foo() -> Result2
try(f(args)) [if f declared with throws] => f(args)
f(args)   [if f returns T and is declared with throws, not inside try()] => 
match f(args) {Ok(x) => x, Fail1(e) => throw e, Fail2(e) => throw e, ...}

In functions declared throws:
return v => return Ok(v)

try {try_body} catch(e1: E1) {catch_body} catch(e2: E2) {catch_body}
=>
let mut e1_: Option = None;
let mut e2_: Option = None;
'try_: loop {try_body; break;}
if e1_.is_some() { let e1 = e1_.unwrap(); catch1_body }
else if e2_.is_some() { let e2 = e2_.unwrap(); catch2_body }

throw e, if there is any containing try block that has a catch block with 
argument eK with type EK such that e is convertible to EK (taking the innermost 
suitable try block and the first suitable catch block)
=> eK_ = Some(e); break 'try_

throw e, if there are no suitable try blocks but the function is declared 
throws and e is convertible to throws type EK
=> return FailK(e)

throw e, otherwise, if e implements ToStr:
=> fail!(e.to_str())

throw e, otherwise:
=> fail!()

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev