Re: [rust-dev] Optimization removes checks
Hello Kamlesh, This mailing list is more-or-less dead; please consider asking your questions on https://users.rust-lang.org/. Regards On Thu, Oct 3, 2019 at 5:37 AM kamlesh kumar wrote: > why does optimization removes overflow checks? > consider below testcase > $cat test.rs > fn fibonacci(n: u32) -> u32 { > let mut f:u32 = 0; > let mut s:u32 = 1; > let mut next: u32 =f+s; > for _ in 1..n { > f = s; > s= next; >next=f+s; > } > next > } > fn main() { > println!("{}",fibonacci(100)); > } > > $rustc test.rs -C opt-level=1 > $./test > 2425370821 > > $rustc test.rs -C opt-level=0 > $./test > thread 'main' panicked at 'attempt to add with overflow', p11.rs:11:7 > note: run with `RUST_BACKTRACE=1` environment variable to display a > backtrace. > > ./Kamlesh > ___ > Rust-dev mailing list > Rust-dev@mozilla.org > https://mail.mozilla.org/listinfo/rust-dev > ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Bare-metal Rust linking with C static library
Hello Eric, Please note that the rust-dev list is (for better or worse) abandonned. You may ask questions on either IRC ( https://chat.mibbit.com/?server=irc.mozilla.orgchannel=%23rust), the users forum (https://users.rust-lang.org/) or StackOverflow. You may also ask on Reddit (https://reddit.com/r/rust), however it's more used for announcements than questions in general. Note that all the community links I gave are accessible directly from http://www.rust-lang.org/ Good luck with your project! On Sat, May 30, 2015 at 8:49 PM, Eric Stutzenberger dynamicstabil...@gmail.com wrote: I'm working on building out a Rust interface to the nRF51x series parts. I have a bare metal system working quite well. The nRF51x has a bluetooth stack (called Softdevice). This stack requires the use of supervisor calls to request the stack to perform certain functions. My plan is to write a C library wrapper around these service calls, compile with arm-none-wabi-gcc and then link this to my Bare-metal rust system. A large chunk of the work I have done thus far is based off of STM32 example work done by Jorge Aparicio (https://github.com/japaric). Since I have the basics up and running, I am working on trying to get a C static library built with arm-none-eabi-gcc and archived with arm-none-eabi-ar to properly link in with my Rust code. I have the following (very basic) .c file: uint32_t sum(uint32_t a, uint32_t b) { return a + b; } I am compiling and linking with the following commands: arm-none-eabi-gcc -Wall -mcpu=cortex-m0 -mthumb -fPIC --specs=nosys.specs -shared test.c -o test.o arm-none-eabi-ar -rs libtest.a test.o In my rust file: [link(name=test, kind=static)] extern { pub fn sum(a: u32, b: u32) - u32; } I then invoke it as a test: pub fn main() { let test_sum = unsafe { sum(2, 3) }; } I am using a Makefile to execute the rust compiler for some specific arguments, such as my specific target: # rustc target TARGET = thumbv6m-none-eabi # toolchain prefix TRIPLE = arm-none-eabi APP_DIR = src/app OUT_DIR = target/$(TARGET)/release DEPS_DIR = $(OUT_DIR)/deps BINS = $(OUT_DIR)/%.hex HEXS = $(OUT_DIR)/%.hex ELFS = $(OUT_DIR)/%.elf OBJECTS = $(OUT_DIR)/intermediate/%.o SOURCES = $(APP_DIR)/%.rs APPS = $(patsubst $(SOURCES),$(BINS),$(wildcard $(APP_DIR)/*.rs)) RUSTC_FLAGS := -C lto -g $(RUSTC_FLAGS) # don't delete my elf files! .SECONDARY: all: rlibs $(APPS) clean: cargo clean # TODO $(APPS) should get recompiled when the `rlibs` change $(OBJECTS): $(SOURCES) mkdir -p $(dir $@) rustc \ $(RUSTC_FLAGS) \ --crate-type staticlib \ --emit obj \ --target $(TARGET) \ -L $(DEPS_DIR) \ -L ../sd110_lib \ --verbose \ -o $@ \ -ltest \ $ $(ELFS): $(OBJECTS) $(TRIPLE)-ld \ --gc-sections \ -T layout.ld \ -o $@ \ $ #size $@ $(BINS): $(ELFS) $(TRIPLE)-objcopy \ -O ihex \ $ \ $@ rlibs: cargo build --target $(TARGET) --verbose --release The cargo.toml is as follows: [package] name = bmd200eval version = 0.1.0 authors = [Eric Stutzenberger eric.stutzenber...@rigado.com] [dependencies.nrf51822] path = ../nrf51822.rs When I run make, I get the following output: . . . mkdir -p target/thumbv6m-none-eabi/release/intermediate/ rustc \ -C lto -g \ --crate-type staticlib \ --emit obj \ --target thumbv6m-none-eabi \ -L target/thumbv6m-none-eabi/release/deps \ -L ../sd110_lib \ --verbose \ -o target/thumbv6m-none-eabi/release/intermediate/blink.o \ -ltest \ src/app/blink.rs src/app/blink.rs:42:9: 42:17 warning: unused variable: `test_sum`, #[warn(unused_variables)] on by default src/app/blink.rs:42 let test_sum = unsafe { sum(2, 3) }; ^~~~ arm-none-eabi-ld \ --gc-sections \ -T layout.ld \ -o target/thumbv6m-none-eabi/release/blink.elf \ target/thumbv6m-none-eabi/release/intermediate/blink.o target/thumbv6m-none-eabi/release/intermediate/blink.o: In function `blink::main': git/rust-nrf/bmd200eval.rs/src/app/blink.rs:42: undefined reference to `sum' I have found numerous different references to linking Rust with C and calling C from Rust but I haven't found a specific answer as to why this will not link. As you can see in the Makefile, I have tried to force rustc's hand in finding and linking against the library, but this doesn't seem to make a difference. Is there an issue with how I am building a library? Since rustc is generating a staticlib in this case, is there some different method that needs to be used? Note that I am avoiding the Clang compiler for the moment due to the following: https://devzone.nordicsemi.com/question/29628/using-clang-and-the-s110-issues-with-supervisor-calls-to-the-softdevice/ Essentially, the gist of the above is that Clang is not quite producing the correct supervisor assembly code for calling in to the bluetooth stack,
Re: [rust-dev] is rust an 'oopl'?
into the situation where you want to have an instance of a type stored in more than one place? Well, you have two options. If the type supports Clone you can call the clone method and produce a duplicate. The exact way clone works is very specific to the type. It might create a completely separate type or the two might still be linked. Do not worry at the moment as this will become evident as you learn Rust. Just keep in mind that for non-copyable types or types in which you do not want a copy you can create a smart-pointer to manage them: let pointer = Rc::new(myothervar); let secondhome = pointer.clone(); myfunction(secondhome); Also to note you will find the smart-pointer clunky and you will be confused as how to write libraries or make a good API for your application. I would like to leave you with one more concept. You will find passin RcMyType around cumbersome. To remedy this you can learn the pattern of making MyType wrap the Rc so the Rc is internal to it. So your API will pass around MyType instead. Okay, sorry for such a long mail. I just hope this little tips and things can help you instead of making you quit leaving a bitter taste for Rust! On Sun, Jan 11, 2015 at 7:17 AM, Mayuresh Kathe mayur...@kathe.in wrote: hello matthieu, thanks for responding. you mentioned that rust supports some object-oriented concepts. may i know which? also, deviating a bit off-topic, would a decent grasp of functional programming be a pre-requisite to learning rust? thanks, ~mayuresh On 2015-01-11 17:21, Matthieu Monrocq wrote: Hello Mayuresh, The problem with your question is dual: - OO itself is a fairly overloaded term, and it is unclear what definition you use for it: Alan Kay's original? The presence of inheritance? ... - Just because a language supports OO concepts does not mean that it ONLY supports OO concepts, many languages are multi-paradigms and can be used for procedural programming, object-oriented programming (in a loose sense given the loose definition in practice), generic programming, functional programming, ... Rust happens to be a multi-paradigms language. It supports some, but not all, object-oriented concepts, but also thrives with free functions and generic functions and supports functional programming expressiveness (but not purity concepts). I would also note that I have C striving to achieve some OO concepts (opaque pointers for encapsulation, virtual-dispatch through manually written virtual-tables, ...), some even in C you cannot necessarily avoid the OO paradigm, depending on the libraries you use. Is Rust a good language for you? Maybe! The only way for you to know is to give it a spin. Have a nice day. -- Matthieu On Sun, Jan 11, 2015 at 2:59 AM, Mayuresh Kathe mayur...@kathe.in wrote: hello, i am an absolute newbie to rust. is rust an object-oriented programming language? i ask because i detest 'oo', and am looking for something better than c. thanks, ~mayuresh ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev [1] Links: -- [1] https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] is rust an 'oopl'?
Hello Mayuresh, The problem with your question is dual: - OO itself is a fairly overloaded term, and it is unclear what definition you use for it: Alan Kay's original? The presence of inheritance? ... - Just because a language supports OO concepts does not mean that it ONLY supports OO concepts, many languages are multi-paradigms and can be used for procedural programming, object-oriented programming (in a loose sense given the loose definition in practice), generic programming, functional programming, ... Rust happens to be a multi-paradigms language. It supports some, but not all, object-oriented concepts, but also thrives with free functions and generic functions and supports functional programming expressiveness (but not purity concepts). I would also note that I have C striving to achieve some OO concepts (opaque pointers for encapsulation, virtual-dispatch through manually written virtual-tables, ...), some even in C you cannot necessarily avoid the OO paradigm, depending on the libraries you use. Is Rust a good language for you? Maybe! The only way for you to know is to give it a spin. Have a nice day. -- Matthieu On Sun, Jan 11, 2015 at 2:59 AM, Mayuresh Kathe mayur...@kathe.in wrote: hello, i am an absolute newbie to rust. is rust an object-oriented programming language? i ask because i detest 'oo', and am looking for something better than c. thanks, ~mayuresh ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] A question about implementation of str
str is simply a pair (length, pointer). The reason that even for a literal the length is packed as an argument is that str does not ONLY work for literals (complete type 'static str) but for any slice of characters, such as those produced by String::as_slice() in which case the lifetime is different (only live as long as the particular String instance) and the length is not necessarily known at compile-time. On Wed, Dec 3, 2014 at 6:34 PM, C K Kashyap ckkash...@gmail.com wrote: Hi, I am stuck in my kernel development where I find that I am not able to iterate over a str. The code is here - https://github.com/ckkashyap/unix/blob/master/kernel/uart.rs in the function uart_putc I find that the for-loop loops the right number of times but it does not print the right character. To me it appears to be a linking problem with my kernel. However, to debug this issue I wanted to get a better understanding of what happens when we iterate over str. I was surprised to see that the length of the string literal that is determined at compile time is being sent as an argument. I'd appreciate any insights into how I can debug this. Regards, Kashyap ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Overflow when benchmarking
Hello, To be clear: there is no such thing as stack/heap in C and C++, there are automatic variable and dynamically allocated variables, the former having their lifetime known statically and the latter not... Whether a particular compiler chooses to use the stack or heap for either is its free choice, as long as it maintains the as-if rule. In this case, I have never heard of automatically moving an automatic variable to the heap, however LLVM routinely uses the stack for dynamically allocated variables if it can prove their lifetime (probably restricted to fixed-size variables below a certain threshold). Regarding Variable Length Arrays (C99), they are not valid in C++, and yes they are traditionally implemented using alloc, for better or worse. -- Matthieu On Fri, Nov 28, 2014 at 4:40 AM, Manish Goregaokar manishsm...@gmail.com wrote: C++/C has a lot of features which seem tantalizing at first; but end up being against the point of a systems language. Putting large arrays on the heap (not sure if C++ does this, but it sounds like something C++ would do) is one -- there are plenty of cases where you explicitly want stack-based arrays in systems programming. Another is the alloca-like behavior of dynamically sized stack-based arrays (just learned about this recently). You always want to be clear of what the compiler is doing. Such optimizations can easily be implemented as a library :) -Manish Goregaokar On Thu, Nov 27, 2014 at 10:20 PM, Diggory Hardy li...@dhardy.name wrote: Shouldn't the compiler automatically put large arrays on the heap? I thought this was a common thing to do beyond a certain memory size. On Thursday 27 November 2014 04:28:03 Steven Fackler wrote: The `nums` array is allocated on the stack and is 8 MB (assuming you're on a 64 bit platform). On Wed Nov 26 2014 at 8:23:08 PM Ben Wilson benwilson...@gmail.com wrote: Hey folks, I've started writing some rust code lately and run into weird behavior when benchmarking. When running https://gist.github.com/benwilson512/56f84d4625f11feb #[bench] fn test_overflow(b: mut Bencher) { let nums = [0i, ..100]; b.iter(|| { let mut x = 0i; for i in range(0, nums.len()) { x = nums[i]; } }); } I get task 'main' has overflowed its stack pretty much immediately when running cargo bench. Ordinarily I'd expect to see that error when doing recursion, but I can't quite figure out why it's showing up here. What am I missing? Thanks! - Ben ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Why there's this asymmetry in defining a generic type/function/method and calling it?
On Wed, Nov 19, 2014 at 2:42 PM, Daniel Trstenjak daniel.trsten...@gmail.com wrote: Hi Paul, On Tue, Nov 18, 2014 at 03:31:17PM -0500, Paul Stansifer wrote: It's not so much the speed of the parser that is the matter, but the fragility of the grammar. The less lookahead that's required, the more likely it is that parser error messages will make sense, and the less likely that a future change to Rust's syntax will introduce an ambiguity. Ok, that's absolutely reasonable. I'm wondering, if it could get distinct by enforcing some properties which are already compile warnings, that types should always start with an upper case and functions/methods with a lower case. Note, the syntax also applies to functions; ie if you have `fn powT: Num(T n, uint e) - T` then to qualify `T` you can use `pow::int(123, 4)`. Therefore using case would not solve the issue (not completely, at least). let foo = (HashMapFoo, Bar::new()); But then 'HashMap' could still e.g. be an enum value instead of a type, but currently you certainly also need some kind of context to distinguish cases like e.g. 'some(x)' and 'Some(x)'. Somehow I think, that's a very good idea to enforce these properties, regardless of the issue here. If you've read code where everything starts with a lower case or upper case (even variables!), then you can really see the value of using the case to distinguish types/functions/methods. Greetings, Daniel ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] On the use of unsafe
It's completely unnecessary actually. If a method requires a XSS-safe string, then it should take the XssSafeString parameter, which would implement DerefString and would be built from a String by a method performing the necessary escaping. If a method requires a SQL-safe string... ah no, don't do that, use bind-parameters and you are guaranteed to be sql-injection safe. In each case, the attributes so defined can be perfectly replaced with appropriate types... so why not use types ? On Mon, Sep 22, 2014 at 4:50 AM, Manish Goregaokar manishsm...@gmail.com wrote: That's not how Rust defines `unsafe`. It's open to misuse, and the compiler will happily point out that it's not being used correctly via the unnecessary unsafe lint. If that's the case, do you think there's some worth in allowing the programmer to define arbitrary generic safety types? E.g have an `#[unsafe(strings)]` attribute that can be placed on methods that break String guarantees (and placed on blocks where we wish to allow such calls). `#[unsafe(sql)]` for SQL methods that are injection-prone. If something like this slide https://www.youtube.com/watch?feature=player_detailpagev=jVoFws7rp88#t=1664 was ever implemented, methods that allow unsafe (XSS-prone) vulnerabilities can have `#[unsafe(xss)]`. Rust does a bunch of compile time checking to achieve memory safety. It also provides a syntax extension/lint system that allows for programmers to define further compile time checks, which open up the gate for many more possible safety guarantees (instead of relying on separate static analysis tools), and not just memory safety. Perhaps we should start recognizing and leveraging that ability more :) -Manish Goregaokar ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Rust BigInt
On Fri, Sep 19, 2014 at 6:13 AM, Daniel Micay danielmi...@gmail.com wrote: On 19/09/14 12:09 AM, Lee Wei Yen wrote: Hi all! I’ve just started learning to use Rust now, and so far it’s been everything I wanted in a language. I saw from the docs that the num::bigint::BigInt type has been deprecated - does it have a replacement? -- Lee Wei Yen It was moved to https://github.com/rust-lang/num There's also https://github.com/thestinger/rust-gmp which binds to GMP. GMP has better time complexity for the operations, significantly faster constant factors (10-20x for some operations) and more functionality. It also doesn't have lots of showstopper bugs since it's a mature library. Disclaimer, for the unwary, GMP is a GPL library; so using it implies complying with the GPL license. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Dynamic format template
While not possible today, there is actually nothing preventing you to create a safe alternative (or even improving format so it works in this way). In a sense, a formatting function has two set of inputs: - the format itself, from which you extract a set of constraints (expected type-signature) - the arguments to format, which can be seen as a single tuple (provided type-signature) And as long as you can ensure at compile time that you never attempt to apply an expected type-signature to an incompatible provided type-signature, then you are safe. I would suppose that as far as having runtime formats go, you would need to introduce an intermediary step: the expected type-signature. You could have a Format object, generic over the expected type-signature, and a new constructor method taking a str and returning an OptionFormat Now, you have two phases: - the new constructor checks, at runtime, that the specified format matches the expected type-signature - the compiler checks, at compile-time, that the provided type-signature (arguments) match the expected type-signature (or it can be coerced to) It might require variadic generics and subtle massaging of the type system, however I do think it would be possible. It might not be the best way to attack the issue though. On Mon, Aug 25, 2014 at 1:33 AM, Kevin Ballard ke...@sb.org wrote: It’s technically possible, but horribly unsafe. The only thing that makes it safe to do normally is the syntax extension that implements `format!()` ensures all the types match. If you really think you need this, you can look at the implementation of core::fmt. But it’s certainly not appropriate for localization, or template engines. -Kevin Ballard On Aug 24, 2014, at 2:48 PM, Vadim Chugunov vadi...@gmail.com wrote: Hi, Is there any way to make Rust's fmt module to consume format template specified at runtime? This might be useful for localization of format!'ed strings, or, if one wants to use format! as a rudimentary template engine. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Integer overflow, round -2147483648
I am not a fan of having wrap-around and non-wrap-around types, because whether you use wrap-around arithmetic or not is, in the end, an implementation detail, and having to switch types left and right whenever going from one mode to the other is going to be a lot of boilerplate. Instead, why not take the same road than swift and map +, -, * and / to non-wrap-around operators and declare new (more verbose) operators for the rare case where performance matters or wrap-around is the right semantics ? Even though Rust is a performance conscious language (since it aims at displacing C and C++), the 80/20 rule still applies and most of Rust code should not require absolute speed; so let's make it convenient to write safe code and prevent newcomers from shooting themselves in the foot by providing safety by default, and for those who profiled their applications or are writing hashing algorithms *also* provide the necessary escape hatches. This way we can have our cake and eat it too... or am I missing something ? -- Matthieu On Sun, Jun 22, 2014 at 5:45 AM, comex com...@gmail.com wrote: On Sat, Jun 21, 2014 at 7:10 PM, Daniel Micay danielmi...@gmail.com wrote: Er... since when? Many single-byte opcodes in x86-64 corresponding to deprecated x86 instructions are currently undefined. http://ref.x86asm.net/coder64.html I don't see enough gaps here for the necessary instructions. You can see a significant number of invalid one-byte entries, 06, 07, 0e, 1e, 1f, etc. The simplest addition would just be to resurrect INTO and make it efficient - assuming signed 64 and 32 bit integers are good enough for most use cases. Alternatively, it could be two one-byte instructions to add an unsigned version (perhaps a waste of precious slots) or a two-byte instruction which could perhaps allow trapping on any condition. Am I missing something? ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Rust's documentation is about to drastically improve
On Wed, Jun 18, 2014 at 6:22 PM, Steve Klabnik st...@steveklabnik.com wrote: In case of trivial entities The problem with this is what's trivial to you isn't trivial to someone else. think about the amount of update this may make necessary in case Rust language syntax changes. Literally my job. ;) Luckily, the syntax has been pretty stable lately, and most changes have just been mechanical. If you could, it would be awesome to invest in a check that the provided examples compile with the current release of the compiler (possibly as part of the documentation generation). This not only guarantees that the examples are up-to-date, but also helps in locating out-dated examples. On the other hand, this may require more boilerplate to get self-contained examples (that can actually be compiled), so YMMV. -- Matthieu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] 7 high priority Rust libraries that need to be written
Could there be a risk in using JSR310 as a basis seeing the recent judgement of the Federal Circuit Court that judged that APIs were copyrightable (in the Google vs Oracle fight over the Java API) ? -- Matthieu On Sat, Jun 7, 2014 at 6:01 PM, Bardur Arantsson s...@scientician.net wrote: On 2014-06-05 01:01, Brian Anderson wrote: # Date/Time (https://github.com/mozilla/rust/issues/14657) Our time crate is very minimal, and the API looks dated. This is a hard problem and JodaTime seems to be well regarded so let's just copy it. JSR310 has already been mentioned in the thread, but I didn't see anyone mentioning that it was accepted into the (relatively) recently finalized JDK8: http://docs.oracle.com/javase/8/docs/api/java/time/package-summary.html The important thing to note is basically that it was simplified quite a lot relative to JodaTime, in particular by removing non-Gregorian chronologies. Regards, ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Patterns that'll never match
On Sun, Jun 1, 2014 at 1:04 PM, Tommi rusty.ga...@icloud.com wrote: On 2014-06-01, at 13:48, Gábor Lehel glaebho...@gmail.com wrote: It would be possible in theory to teach the compiler about e.g. the comparison operators on built-in integral types, which don't involve any user code. It would only be appropriate as a warning rather than an error due to the inherent incompleteness of the analysis and the arbitrariness of what things to include in it. No opinion about whether it would be worth doing. Perhaps this kind of thing would be better suited for a separate tool that could (contrary to a compiler) run this and other kinds of heuristics without having to worry about blowing up compilation times. This is typically the domain of either static analysis or runtime instrumentation (branch coverage tools) in the arbitrary case, indeed. -- Matthieu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] A better type system
FYI: I did a RFC for separating mut and only some times ago: https://github.com/rust-lang/rfcs/pull/78# I invite the interested readers to check it out and read the comments (notably those by thestinger, aka Daniel Micay on this list). For now, my understanding was that proposals on the topic were suspended until the dev team manages to clear its plate of several big projects (such as DST), especially as thestinger had a proposal to change the way lambda captures are modeled so it no longer requires a uniq (only accessible to the compiler). -- Matthieu On Sun, Jun 1, 2014 at 2:32 AM, Patrick Walton pwal...@mozilla.com wrote: Yes, you could eliminate (c) by prohibiting taking references to the inside of sum types (really, any existential type). This is what Cyclone did. For (e) I'm thinking of sum types in which the two variants have different sizes (although maybe that doesn't work). We'd basically have to bring back the old mut as a separate type of pointer to make it work. Note that Niko was considering a system like this in older blog posts pre-INHTWAMA. (Search for restrict pointers on his blog.) Patrick On May 31, 2014 5:26:39 PM PDT, Cameron Zwarich zwar...@mozilla.com wrote: FWIW, I think you could eliminate (c) by prohibiting mutation of sum types. What case are you thinking of for (e)? For (d), this would probably have to be distinguished from the current mut somehow, to allow for truly unique access paths to sum types or shared data, so you could preserve any aliasing optimizations for the current mut. Of course, more functions might take the less restrictive version, eliminating the optimization that way. Not that I think that this is a great idea; I’m just wondering whether there are any caveats that have escaped my mental model of the borrow checker. Cameron On May 31, 2014, at 5:01 PM, Patrick Walton pwal...@mozilla.com wrote: I assume what you're trying to say is that we should allow multiple mutable references to pointer-free data. (Note that, as Huon pointed out, this is not the same thing as the Copy bound.) That is potentially plausible, but (a) it adds more complexity to the borrow checker; (b) it's a fairly narrow use case, since it'd only be safe for pointer-free data; (c) it admits casts like 3u8 - bool, casts to out-of-range enum values, denormal floats, and the like, all of which would have various annoying consequences; (d) it complicates or defeats optimizations based on pointer aliasing of mut; (e) it allows uninitialized data to be read, introducing undefined behavior into the language. I don't think it's worth it. Patrick On May 31, 2014 4:42:10 PM PDT, Tommi rusty.ga...@icloud.com wrote: On 2014-06-01, at 1:02, Patrick Walton pcwal...@mozilla.com wrote: fn my_transmuteT:Clone,U(value: T, other: U) - U { let mut x = Left(other); let y = match x { Left(ref mut y) = y, Right(_) = fail!() }; *x = Right(value); (*y).clone() } If `U` implements `Copy`, then I don't see a (memory-safety) issue here. And if `U` doesn't implement `Copy`, then it's same situation as it was in the earlier example given by Matthieu, where there was an assignment to an `OptionBoxstr` variable while a different reference pointing to that variable existed. The compiler shouldn't allow that assignment just as in your example the compiler shouldn't allow the assignment `x = Right(value);` (after a separate reference pointing to the contents of `x` has been created) if `U` is not a `Copy` type. But, like I said in an earlier post, even though I don't see this (transmuting a `Copy` type in safe code) as a memory-safety issue, it is a code correctness issue. So it's a compromise between preventing logic bugs (in safe code) and the convenience of more liberal mutation. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] A better type system
Iterator invalidation is a sweet example, which strikes at the heart of C++ developer (those who never ran into it, please raise your hands). However it is just an example, anytime you have aliasing + mutability, you may have either memory issues or logical bugs. Another example of memory issue: foo(left: OptionBoxstr, right: mut OptionBoxstr) { let ptr: str = *left.unwrap(); right = None; match ptr.len() { // Watch out! if left and right alias, then ptr is no a dangling reference! // ... } } The issue can actually occur in other ways: replace Boxstr by enum Point { Integral(int, int), Floating(f64, f64) } and you could manage to write integral into floats or vice-versa, which is memory-corruption, not segmentation fault. The Rust type system allows, at the moment, to ensure that you never have both aliasing and mutability. Mostly at compile-time, and at run-time through a couple unsafe hatches (Cell, RefCell, Mutex, ...). I admit it is jarring, and constraining. However the guarantee you get in exchange (memory-safe thread-safe) is extremely important. I'm writing this from a phone and I haven't thought of this issue very thoroughly. Well, think a bit more. If you manage to produce a more refined type-system, I'd love to hear about it. In the mean time though, I advise caution in criticizing the existing: it has the incredible advantage of working. On Sat, May 31, 2014 at 7:54 PM, Alex Crichton acrich...@mozilla.com wrote: Sorry for the brevity, I'm writing this from a phone and I haven't thought of this issue very thoroughly. You appear to dislike one of the most fundamental features of Rust, so I would encourage you to think through ideas such as this before hastily posting to the mailing list. The current iteration of Rust has had a great deal of thought and design poured into it, as well as having at least thousands of man hours of effort being put behind it. Casually stating, with little prior thought, that large chunks of this effort are flatly wrong is disrespectful to those who have put so much time and effort into the project. We always welcome and encourage thoughtful reconsiderations of the design decisions of Rust, but these must be performed in a constructive and well-thought-out manner. There have been many times in the past where the design decisions of Rust have been reversed or redone, but these were always accompanied with a large amount of research to fuel the changes. If you have concrete suggestions, we have an RFC process in place for proposing new changes to the language while gathering feedback at the same time. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] A few random questions
On Fri, May 30, 2014 at 2:01 AM, Oleg Eterevsky o...@eterevsky.com wrote: Since browsers were brought up, here is the Google C++ style guide on exceptions: http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Exceptions As someone who works for Google, I can attest, that exceptions are encouraged in Google style guides for Python and Java and the main reason they are forbidden in C++ is their memory safety. Google has a big amount of pre-exceptions C++ code, and it will break in unexpected places if exceptions are allowed. Yes, which is a common issue. Exception usage requires exception-safe code. But then, exception-safe code is also code resilient in the face of introducing other return paths so it's just overall better whether in the presence of exceptions or not... Go is a different story. It deliberately refuses to support exceptions even though it has GC and hence has no problems with exception memory safety whatsoever. The lack of exception might be one of the main reasons (if not the main reason), why Go is not so popular even within Google. Personally, I've found exceptions too unwieldy. As I mentioned, the issue of catching an exception is now, how do I recover ?. Note that Rust and Go do have exceptions (and unwinding), it's just that you have to create a dedicated task instead of a try/catch block. Indeed, it's more verbose (which is mostly a matter of libraries/macros) and it's also less efficient (which could be addressed, though at compiler level); however it's just plain safer: now that shared-state/updates to the external world are explicit, you can much more easily evaluate what it takes to recover. On Thu, May 29, 2014 at 4:39 PM, comex com...@gmail.com wrote: On Thu, May 29, 2014 at 7:10 PM, Oleg Eterevsky o...@eterevsky.com wrote: The projects in C++ that forbid exceptions are doing so not because of some prejudice, but because exceptions in C++ are unsafe. In Java standard library exceptions are ubiquitous. If you mean checked exceptions, I hear that they're quite unpopular, although I don't use Java. Since browsers were brought up, here is the Google C++ style guide on exceptions: http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Exceptions It bans them due to a variety of downsides which would only be partially addressed by checked-exception-like safety systems. I think Google Java code does use exceptions, but that's language culture for you. As a related data point, Go eschews exceptions entirely due to prejudice: http://golang.org/doc/faq#exceptions Not that I agree with most of Go's design decisions... still, I think these examples are enough to demonstrate that there are legitimate reasons to prefer a language designed without exceptions. I think it may be good for you to get more experience with Rust, although as I mentioned, I also lack experience. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] How to find Unicode string length in rustlang
Except that in C++ std::basic_string::size and std::basic_string:length are synonymous (both return the number of CharTs, which in std::string is also the number of bytes). Thus I am unsure whether this would end up helping C++ developers. Might help others though. On Fri, May 30, 2014 at 2:12 PM, Nathan Myers n...@cantrip.org wrote: A good name would be size(). That would avoid any confusion over various length definitions, and just indicate how much address space it occupies. Nathan Myers On May 29, 2014 8:11:47 PM Palmer Cox palmer...@gmail.com wrote: Thinking about it more, units() is a bad name. I think a renaming could make sense, but only if something better than len() can be found. -Palmer Cox On Thu, May 29, 2014 at 10:55 PM, Palmer Cox palmer...@gmail.com wrote: What about renaming len() to units()? I don't see len() as a problem, but maybe as a potential source of confusion. I also strongly believe that no one reads documentation if they *think* they understand what the code is doing. Different people will see len(), assume that it does whatever they want to do at the moment, and for a significant portion of strings that they encounter it will seem like their interpretation, whatever it is, is correct. So, why not rename len() to something like units()? Its more explicit with the value that its actually producing than len() and its not all that much longer to type. As stated, exactly what a string is varies greatly between languages, so, I don't think that lacking a function named len() is bad. Granted, I would expect that many people expect that a string will have method named len() (or length()) and when they don't find one, they will go to the documentation and find units(). I think this is a good thing since the documentation can then explain exactly what it does. I much prefer len() to byte_len(), though. byte_len() seems like a bit much to type and it seems like all the other methods on strings should then be renamed with the byte_ prefix which seems unpleasant. -Palmer Cox On Thu, May 29, 2014 at 3:39 AM, Masklinn maskl...@masklinn.net wrote: On 2014-05-29, at 08:37 , Aravinda VK hallimanearav...@gmail.com wrote: I think returning length of string in bytes is just fine. Since I didn't know about the availability of char_len in rust caused this confusion. python 2.7 - Returns length of string in bytes, Python 3 returns number of codepoints. Nope, depends on the string type *and* on compilation options. * Python 2's `str` and Python 3's `bytes` are byte sequences, their len() returns their byte counts. * Python 2's `unicode` and Python 3's `str` before 3.3 returns a code units count which may be UCS2 or UCS4 (depending whether the interpreter was compiled with `—enable-unicode=ucs2` — the default — or `—enable-unicode=ucs4`. Only the latter case is a true code points count. * Python 3.3's `str` switched to the Flexible String Representation, the build-time option disappeared and len() always returns the number of codepoints. Note that in no case to len() operations take normalisation or visual composition in account. JS returns number of codepoints. JS returns the number of UCS2 code units, which is twice the number of code points for those in astral planes. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] EnumSet, CLike and enums
I advise you to check the tests accompanying EnumSet (in the source code): http://static.rust-lang.org/doc/master/src/collections/home/rustbuild/src/rust-buildbot/slave/nightly-linux/build/src/libcollections/enum_set.rs.html#144-158 They show a simple implementation: impl CLike for Foo { fn to_uint(self) - uint { *self as uint } fn from_uint(v: uint) - Foo { unsafe { mem::transmute(v) } } } which uses transmute to avoid that manual maintenance. Note though that in general if you wanted to add new enum values while maintaining it sorted alphabetically and still be backward-compatible you would need to handle the values manually. On Fri, May 30, 2014 at 8:41 PM, Igor Bukanov i...@mir2.org wrote: Is it possible to somehow automatically derive collections::enum_set::CLike for a enum? The idea of writing impl CLike for MyEnum { fn to_uint(self) - uint { return *self as uint; } fn from_uint(n: uint) - Flag { match n { 0 = EnumConst1, ... _ = fail!({} does not match any enum case, n) } } } just to get a type safe bit set EnumSetMyEnum is rather discouraging. On a related note I see that EnumSet never checks that CLike::to_int result stays below the word size. Is it a bug? ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] cannot borrow `st` as mutable more than once at a time
Does this mean that the desugaring of the for loop is incorrect ? Or at least, could be improved. On Thu, May 29, 2014 at 8:22 PM, Vladimir Matveev dpx.infin...@gmail.com wrote: Hi, Christophe, Won't wrapping the first `for` loop into curly braces help? I suspect this happens because of `for` loop desugaring, which kind of leaves the iterator created by `execute_query()` in scope (not really, but only for borrow checker). 2014-05-29 19:38 GMT+04:00 Christophe Pedretti christophe.pedre...@gmail.com: Hello all, i know that this issue is already covered by issues #9113 #6393 and #9113 but actually i have no solution. My code is a library for accessing databases. In my example, the database represented by db containes a table t with columns i:integer, f:float, t:text, b:blob. Everything works fine except the following code used to test my library match db.prepare_statement(SELECT i,f,t,b FROM t where t like ?;) { Ok(mut st) = { st.set_string(1, %o%); for i in st.execute_query() { match i { Ok(s) = println!({}:{}:{}:{}, s.get_long(0), s.get_double(1), s.get_string(2), s.get_blob(3) ), Err(e) = match e.detail { Some(s) = println!({}, s), None = () } } } st.set_string(1, %e%); - PROBLEM HERE for i in st.execute_query() { - PROBLEM HERE ... } }, Err(e) = match e.detail { None = (), Some(s) = println!({}, s) } } The compilation error says test-db.rs:71:8: 71:10 error: cannot borrow `st` as mutable more than once at a time test-db.rs:71 st.set_string(1, %e%); ^~ test-db.rs:61:17: 61:19 note: previous borrow of `st` occurs here; the mutable borrow prevents subse quent moves, borrows, or modification of `st` until the borrow ends test-db.rs:61 for i in st.execute_query() { ^~ test-db.rs:88:7: 88:7 note: previous borrow ends here test-db.rs:58 match db.prepare_statement(SELECT i,f,t,b FROM t wh ere t like ?;) { ... test-db.rs:88 } ^ error: aborting due to previous error do we have a solution for #6393 ? Thanks -- Christophe http://chris-pe.github.io/Rustic/ ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Something like generics, but with ints
It's been discussed, but there is still discussion on the best way to achieve this. At the moment, you should be able to get around it using Peano numbers [1]: struct Zero; struct SuccT; struct MatrixT, M, N { data: VecT, } fn cofactorT, M, N( m: MatrixT, SuccM, SuccN, row: int, col: int ) - MatrixT, M, N { Matrix::T, M, N{ data: vec!() } } Of course, I would dread seeing the error message should you need more than a couple rows/columns... [1] http://www.haskell.org/haskellwiki/Peano_numbers On Sun, May 25, 2014 at 7:25 PM, Isak Andersson cont...@bitpuffin.comwrote: Hello! I was asking in IRC if something like this: fn cofactor(m: MatrixT, R, C, row, col: int) - MatrixT, R-1, C-1 {...} was possible. I quickly got the response that generics doesn't work with integers. So my question is, is there anyway to achieve something similar? Or would it be possible in the future to do generic instantiation based on more than just types. Thanks! ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Qt5 Rust bindings and general C++ to Rust bindings feedback
On Sat, May 24, 2014 at 9:06 AM, Zoltán Tóth zo1...@gmail.com wrote: Alexander, your option 2 could be done automatically. By appending postfixes to the overloaded name depending on the parameter types. Increasing the number of letters used till the ambiguity is fully resolved. What do you think? fillRect_RF_B ( const QRectF rectangle, const QBrush brush ) fillRect_I_I_I_I_BS ( int x, int y, int width, int height, Qt::BrushStyle style ) fillRect_Q_BS ( const QRect rectangle, Qt::BrushStyle style ) fillRect_RF_BS ( const QRectF rectangle, Qt::BrushStyle style ) fillRect_R_B ( const QRect rectangle, const QBrush brush ) fillRect_R_C ( const QRect rectangle, const QColor color ) fillRect_RF_C ( const QRectF rectangle, const QColor color ) fillRect_I_I_I_I_B ( int x, int y, int width, int height, const QBrush brush ) fillRect_I_I_I_I_C ( int x, int y, int width, int height, const QColor color ) fillRect_I_I_I_I_GC ( int x, int y, int width, int height, Qt::GlobalColor color ) fillRect_R_GC ( const QRect rectangle, Qt::GlobalColor color ) fillRect_RF_GC ( const QRectF rectangle, Qt::GlobalColor color ) I believe this alternative was considered in the original blog post Alexander wrote: this is, in essence, mangling. It makes for ugly function names, although the prefix helps in locating them I guess. Before we talk about generation though, I would start about investigating where those overloads come from. First, there are two different objects being manipulated here: + QRect is a rectangle with integral coordinates + QRectF is a rectangle with floating point coordinates Second, a QRect may already be build from (int* x*, int* y*, int* width*, int* height*); thus all overloads taking 4 hints instead of a QRect are pretty useless in a sense. Third, in a similar vein, QBrush can be build from (Qt::BrushStyle), (Qt::GlobalColor) or (QColor const). So once again those overloads are pretty useless. This leaves us with: + fillRect(QRect const, QBrush const) + fillRect(QRectF const, QBrush const) Yep, that's it. Of all those inconsistent overloads (missing 4 taking 4 floats, by the way...) only 2 are ever useful. The other 10 can be safely discarded without impacting the expressiveness. Now, of course, the real question is how well a tool could perform this reduction step. I would note here that the position and names of the coordinate arguments of fillRect is exactly that of those to QRect; maybe a simple exhaustive search would thus suffice (though it does require semantic understanding of what a constructor and default arguments are). It would be interesting checking how many overloads remain *after* this reduction step. Here we got a factor of 6 already (should have been 8 if the interface had been complete). It would also be interesting checking if the distinction int/float often surfaces, there might be an opportunity here. -- Matthieu Alexander Tsvyashchenko wrote: So far I can imagine several possible answers: 1. We don't care, your legacy C++ libraries are bad and you should feel bad! - I think this stance would be bad for Rust and would hinder its adoption, but if that's the ultimate answer - I'd personally prefer it said loud and clear, so that at least nobody has any illusions. 2. Define maintain the mapping between C++ and Rust function names (I assume this is what you're alluding to with define meaningful unique function names above?) While this might be possible for smaller libraries, this is out of the question for large libraries like Qt5 - at least I won't create and maintain this mapping for sure, and I doubt others will: just looking at the stats from 3 Qt5 libraries (QtCore, QtGui and QtWidgets) out of ~30 Qt libraries in total, from the 50745 wrapped methods 9601 were overloads and required renaming. Besides that, this has a disadvantage of throwing away majority of the experience people have with particular library and forcing them to le-learn its API. On top of that, not for every overload it's easy to come up with short, meaningful, memorable and distinctive names - you can try that exercise for http://qt-project.org/doc/qt-4.8/qpainter.html#fillRect;-) 3. Come up with some way to allow overloading / default parameters - possibly with reduced feature set, i.e. if type inference is difficult in the presence of overloads, as suggested in some overloads discussions (although not unsolvable, as proven by other languages that allow both type inference overloading?), possibly exclude overloads from the type inference by annotating overloaded methods with special attributes? 4. Possibly some other options I'm missing? -- Good luck! Alexander ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] New on Rust/Servo
And let's not forget the ever useful https://github.com/bvssvni/rust-emptyto get a pre-made Makefile for rust. On Sat, May 17, 2014 at 1:11 PM, Artella Coding artella.cod...@googlemail.com wrote: http://tomlee.co/2014/04/03/a-more-detailed-tour-of-the-rust-compiler/ On Sat, May 17, 2014 at 12:57 AM, Ricardo Brandão rbrandao...@gmail.comwrote: Hi ALL, I'd like to introduce my self. I'm a Computer Engineer and I've worked with Embedded Computer for 12 years before work with IT Management for 10 years. Now I became a Mozillian, studying Firefox OS, Gonk and Gecko and very excited to come back to technical world. Last week I attended a lecture on FISL (a Free Software Forum) in Brazil about Rust and Servo. From Bruno Abinader I'm very interested on these project and I'd like to join it. I have a good experience with C and Assembly, but not exactly with Unix-like platform. I was used to program directly to the board. I've used ZWorld Boards (nowadays ZWorld became Digi). But I tried to see some Easy bugs (on rust and servo repos) to at least understand, but I'm confused. Could you give me some step-by-step, how begin the study of project, which documents to read, etc? Remember I'm not an expert on Makefile and C for Unix-like platforms. Well, I already worked with but in small projects. Thanks in advance! -- Ricardo Brandão http://www.programonauta.com.br __@ ._ \ _ (_) / (_) ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] How to implement a singleton ?
Hello, My first instinct would be: don't... but in the name of science... Have you tried looking at Stack Overflow ? Just googling around I found: http://stackoverflow.com/questions/19605132/is-it-possible-to-use-global-variables-in-rustwhich allows you to have a global variable and from there a Singleton seems easy. I guess you will need something like MutexOptionType if you want lazy initialization. -- Matthieu On Thu, May 15, 2014 at 9:59 AM, Christophe Pedretti christophe.pedre...@gmail.com wrote: I am trying to implement a Singleton (an object which instantiate only once, successive instantiations returning the object itself). Any tutorial for this ? any idea ? example ? best practice ? thanks ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] UTF-8 strings versus encoded ropes
On Wed, May 14, 2014 at 2:25 PM, Armin Ronacher armin.ronac...@active-4.com wrote: Hi, On 02/05/2014 00:03, John Downey wrote: I have actually always been a fan of how .NET did this. The System.String type is opinionated in how it is stored internally and does not allow anyone to change that (unlike Ruby). The conversion from String to byte[] is done using explicit conversion methods like: Unfortunately the .NET string type does not support UCS4 and as such is a nightmare to deal with. Also because the internal encoding is not UTF-8 *any* interaction with the outside world (ignoring the win32 api) is going through an encode/decode step which can be unnecessary. For instance if you would do that on Linux you would decode from utf-8 to your internal UCS4 encoding, then encode back to utf-8 on the way back to the terminal. (Aside from that, 32bit for a charpoint is too large as unicode does not go in more than 21bit or something. Useless) Even keeping whole bytes, 3 bytes (24 bits) is effectively sufficient for the whole of Unicode. If you don't mind some arithmetic, you could thus use a backing array of bytes and just recompose the value on output. Regards, Armin ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Ideas to build Rust projects
I agree that a protable terminal would be sweet, however the terminal and shell are only half the story: you then need a uniform set of tools behind the scenes else all your scripts fail. I would like to take the opportunity to point out Mosh [1] as an existing (and recent) shell, it might make for a great starting point. Regarding project ideas, I myself would be very interested in: - concurrent collections (lists, hash-sets, hash-maps, ...), while I know this is not in the spirit of CSP sometimes forcing a single queue to access a collection creates a bottleneck. - a MPMC queue, at the moment Rust stops at MPSC with its channels and once again when the load is too important you really need to be able to have multiple consumers. It could potentially be tied into a WorkerPool implementation where you can freely administrate the pool size and just post jobs to the pool, but maybe it could be implemented free-standing. [1]: http://mosh.mit.edu/ On Sat, Apr 19, 2014 at 4:45 PM, Mahmut Bulut mahmutbul...@gmail.comwrote: Terminal portable is good choice for all. But I want to say that I started to write util-linux in Rust. Ok there is coreutils but we should extend it with perfect system integration. I don't have time to complete all of util-linux but if contrbution comes it can merge into coreutils. You can take a look to Trafo(rewrite of util-linux): https://github.com/vertexclique/trafo Mahmut Bulut On 19 Apr 2014, at 12:36, John Mija jon...@proinbox.com wrote: Sometimes, developers need ideas or cool projects to be inspired. Here you have some ones, please share some more. + Implementation of the Raft distributed consensus protocol. It will allow to build distributed systems Implementations in Go: https://github.com/goraft/raft https://github.com/hashicorp/raft + Key-value embedded database LDBM was built as backend for OpenLDAP, but it is being used in many projects. The benchmarks (LevelDB, Kyoto TreeDB, LDBM, BerkeleyDB, SQLite3) show that it is faster for read operations, although it's something slower than LevelDB for writing. http://symas.com/mdb/ There is a pure Go key/value store inspired by the LMDB project: https://github.com/boltdb/bolt + Terminal portable Today, to access to a terminal in Unix or windows, you need to provide an interface. The great issue is that Unix terminal and Windows console have different APIs, so it's very hard to get a portable API for each system. Instead, could be created a terminal from scratch handling all in low level (without using the Windows API). ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Do I need to watch out for memory fragmentation?
On Mon, Apr 14, 2014 at 10:32 PM, Daniel Micay danielmi...@gmail.comwrote: On 14/04/14 12:41 PM, Matthieu Monrocq wrote: Memory fragmentation is a potential issue in all languages that not use a Compacting GC, so yes. It's much less of an issue than people make it out to be on 32-bit, and it's a non-issue on 64-bit with a good allocator (jemalloc, tcmalloc). Small dynamic memory allocations are tightly packed in arenas, with a very low upper bound on fragmentation and metadata overhead. At a certain cutoff point, allocations begin to fall through directly to mmap instead of using the arenas. On 64-bit, the address space is enormous so fragmenting it is only a problem when it comes to causing TLB misses. By the way, do you have any idea how this is going to pan out on processors like the Mill CPU where the address space is shared among processes ? There are some attenuating circumstances in Rust, notably the fact that unless you use a ~ pointer the memory is allocated in a task private heap which is entirely recycled at the death of the task, but memory fragmentation is always a potential issue. All dynamic memory allocations are currently done with the malloc family of functions, whether you use sendable types like `VecT`, `ArcT` and `~T` or task-local types like `RcT`. Using a task-local heap for types like `RcT` would only serve to *increase* the level of fragmentation by splitting it up more. For example, jemalloc implements thread-local caching, and then distributes the remaining workload across a fixed number of arenas. Increasing the level of thread-local caching has a performance benefit but by definition increases the level of fragmentation due to more unused capacity assigned to specific threads. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Do I need to watch out for memory fragmentation?
Memory fragmentation is a potential issue in all languages that not use a Compacting GC, so yes. There are some attenuating circumstances in Rust, notably the fact that unless you use a ~ pointer the memory is allocated in a task private heap which is entirely recycled at the death of the task, but memory fragmentation is always a potential issue. On Mon, Apr 14, 2014 at 6:19 PM, Zach Moazeni zach.li...@gmail.com wrote: Hello, I'm starting to explore Rust, and as someone who has primarily worked in GC'd languages I'm curious if I need to watch out for anything related to memory fragmentation. Or if Rust or LLVM is doing something under the covers where this is less of an issue. Kind regards, Zach ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] [discussion] preemptive scheduling
Hello, As far as I know in Rust, a thread (green or not) that enters an infinite loop without I/O is forever stuck. The only available option to stop it is to have the OS kill the process (CTRL+C). In my day job, all our servers services are time-bounded and any that exceeds its time bound is killed. To do so requires one process per service for the exact same reason than Rust, which has the unfortunate effect of requiring a large memory footprint because utility threads (such as timers, and notably the watch-dog timer) are replicated in each and every process. The most common source of time-slips are disk accesses and database accesses which is covered by Rust under I/O, however I've already seen infinite loops (or very long ones) and there seems to be no way to protect against those. Of course one could recommend that such loops check a flag or something, but if we knew those loops were going to diverge we would fix them, not instrument them. I was hoping that with Rust (which already rids us of good-bye to dangling pointers data races) we could move toward a single process with a lot of concurrent (green) tasks for better efficiency and ease of development, however the latter seems unattainable because of infinite loops or otherwise diverging code right now. I would thus also appreciate if anybody had an idea how to preempt a misbehaving task, even if the only option is to trigger this task failure; the goal at this point is to salvage the system without losing the current workload. -- Matthieu On Sat, Apr 12, 2014 at 11:04 AM, Jeremy Ong jeremyc...@gmail.com wrote: I am considering authoring a webserver (think nginx, apache, cowboy, etc) in Rust. From a user point of view, mapping tasks (green) to web requests makes the most sense as the tasks could be long running, perform their own I/O, sessions, or what have you. It would also allow the user to do per-request in memory caching. My main concern is obviously the cooperative scheduler. Given that the mantra of Rust seems to be safety, I'm curious about how feasible it would be to provide the option for task safety as well. Preemptive scheduling provides two things: 1. If preemption is used aggressively, the user can opt for a lower latency system (a la Erlang style round robin preemptive scheduling) 2. Preemption of any sort can be used as a safety net to isolate bugs or blocks in tasks for long running systems, or at least mitigate damage until the developer intervenes. I noticed in issue 5731[1] on the repo, it was pointed out that this was possible, albeit difficult. The issue was closed with a comment that the user should use OS threads instead. I really think this misses the point as it no longer allows preemption on a smaller granularity scale. Could any devs chime in on the scope and difficulty of this project? Could any users/devs chime in on any of the points above? tl;dr I think preemptive scheduling is a must for safe concurrency in long running executables at the bottom of the stack. Opinions? [1] https://github.com/mozilla/rust/issues/5731 ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Everything private by default
On Thu, Mar 27, 2014 at 8:12 PM, Tommi rusty.ga...@icloud.com wrote: [The following post has nothing to do with thread. I'm posting it here because my new posts to this mailing list don't go through (this happens to me a lot). Replies to existing posts tend to go through, thus I'm hijacking my own thread.] Title: Compiling with no bounds checking for vectors? Why isn't there a compiler flag like 'noboundscheck' which would disable all bounds checking for vectors? It would make it easier to have those language performance benchmarks (which people are bound to make with no bounds checking in C++ at least) be more apples-to-apples comparisons. Also, knowing there's a flag in case you need one would put performance-critical people's mind at ease. Because you can already have the functionality by using `unsafe`, so why should one at *twice* the same functionality in different ways ? I believe optimizers should be good enough to remove most bound checks (especially in loops), and if there are cases where they don't it might be worth checking what's preventing this optimization. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Lightweight failure handling
On Thu, Mar 27, 2014 at 3:43 PM, Clark Gaebel cg.wowus...@gmail.com wrote: aside: Your last message didn't get CC'd to rust-dev. I've re-added them, and hope dearly I haven't committed a social faux pas. That's interesting. You're kinda looking for exception handling in rust! Unfortunately the language seems pretty principled in its opinion that failure should be handled at the task boundary exclusively, and this is a pretty heavyweight opinion. This wouldn't be so bad if people would stop fail!ing everywhere! I'm personally very against the seemingly growing trend of people doing things like calling unwrap() on options instead of propagating errors up. This makes accidental failure far, far more common than it should be. I hope when higher-kinded-types and unboxed closures land, people will start using a monadic interface to results and options, as this will hopefully make error propagation less painful. We'll see. As for your specific case, I don't really have an answer. Is just don't call fail! an option? Maybe an automatically-inferred #[will_not_fail] annotation has a place in the world... - Clark Actually, there is nothing in the task model that prevents them from being run immediately in the same OS thread, and on the same stack. It just is an implementation detail. In the behavior, the main difference between try/catch in Java and a Task in Rust is that a Task does not leave a half-corrupted environment when it exits (because everything it interacted with dies with it). Implementation-wise, there may be some hurdles to get a contiguous task as cheap as a try/catch: unwind boundary, detecting that the task is viable for that optimization at the spawn point, etc... but I can think of nothing that is absolutely incompatible. I would be happy for a more knowledgeable person to chime in on this point. -- Matthieu On Thu, Mar 27, 2014 at 3:51 AM, Phil Dawes rustp...@phildawes.netwrote: Hi Clark, Thanks for the clarification. To follow your example, there are multiple 'process_msg()' steps, and if one fails I don't want it to take down the whole loop. Cheers, Phil On Wed, Mar 26, 2014 at 10:25 PM, Clark Gaebel cg.wowus...@gmail.comwrote: Sorry, was on my phone. Hopefully some sample code will better illustrate what I'm thinking: loop { let result : ResultFoo, () = task::try(proc() { loop { recv_msg(); // begin latency sensitive part process_msg(); send_msg (); // end latency sensitive part } }); if result.is_ok() { return result; } else { continue; } } This way, you only pay for the try if you have a failure (which should hopefully be infrequently), and you get nice task isolation! On Wed, Mar 26, 2014 at 6:05 PM, Clark Gaebel cg.wowus...@gmail.comwrote: The main loop of your latency sensitive application. On Mar 26, 2014 5:56 PM, Phil Dawes rustp...@phildawes.net wrote: On Wed, Mar 26, 2014 at 9:44 PM, Clark Gaebel cg.wowus...@gmail.comwrote: Can't you put that outside your inner loop? Sorry Clark, you've lost me. Which inner loop? ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev -- Clark. Key ID : 0x78099922 Fingerprint: B292 493C 51AE F3AB D016 DD04 E5E3 C36F 5534 F907 -- Clark. Key ID : 0x78099922 Fingerprint: B292 493C 51AE F3AB D016 DD04 E5E3 C36F 5534 F907 ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Bounds on type variables in structs, enums, types
On Tue, Mar 25, 2014 at 6:00 PM, Patrick Walton pcwal...@mozilla.comwrote: On 3/24/14 11:46 PM, Nick Cameron wrote: Currently we forbid bounds on type parameters in structs, enums, and types. So the following is illegal: struct SX: B { f: ~TX, } IIRC Haskell allows bounds on type parameters (and we did once too), but I heard that considered deprecated and not preferred. I don't recall the exact reasons, but that's why we removed the feature (and also just for language simplicity). Patrick If I remember the reason cited in Haskell design it was that some functions require more bounds than others. For example a HashMap generally requires that the key be hashable somehow, but the isEmpty or size functions on a HashMap have no such requirement. Therefore, you would end up with a minimal bounds precised at Type level, and then each function could add some more bounds depending on their needs: that's 2 places to specify bounds. In the name of simplicity (and maximum reusability of types) Haskell therefore advise to only use bounds on functions. However, I seem to remember than in Haskell the bounds are only Traits; whereas in Rust some bounds may actually be required to be able to instantiate the type (Sized ?). -- Matthieu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Structural Typing
I would note that Rust macros are actually working with structural typing: the expanded macro cannot be compiled unless the expressions/statements it results in can be compiled. Regarding Scala here, it seems a weird idea to ask that each and every method should copy+paste the interface. We all know the woes of duplication. Instead, you can define a Trait (even if for a single function) and it'll just work; and when you add a second function you will be able to re-use the same trait. On Sun, Mar 23, 2014 at 11:37 AM, Liigo Zhuang com.li...@gmail.com wrote: IMO, this is bad. 2014年3月23日 下午6:34于 Ziad Hatahet hata...@gmail.com写道: Hi all, Are there any plans to implement structural typing in Rust? Something like this Scala code: http://en.wikipedia.org/wiki/Duck_typing#In_Scala ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Virtual fn is a bad idea
And of course I forgot to reply to the list at large... sorry :x -- Matthieu On Wed, Mar 12, 2014 at 8:48 PM, Matthieu Monrocq matthieu.monr...@gmail.com wrote: On Tue, Mar 11, 2014 at 10:18 PM, Patrick Walton pcwal...@mozilla.comwrote: On 3/11/14 2:15 PM, Maciej Piechotka wrote: Could you elaborate on DOM? I saw it referred a few times but I haven't seen any details. I wrote simple bindings to libxml2 dom (https://github.com/uzytkownik/xml-rs - warning - I wrote it while I was learning ruby) and I don't think there was a problem of OO - main problem was mapping libxml memory management and rust's one [I gave up with namespaces but with native rust dom implementation it would be possible to solve in nicer way]. Of course - I might've been at too early stage. You need: 1. One-word pointers to each DOM node, not two. Every DOM node has 5 pointers inside (parent, first child, last child, next sibling, previous sibling). Using trait objects would 10 words, not 5 words, and would constitute a large memory regression over current browser engines. 2. Access to fields common to every instance of a trait without virtual dispatch. Otherwise the browser will be at a significant performance disadvantage relative to other engines. 3. Downcasting and upcasting. 4. Inheritance with the prefix property, to allow for (2). If anyone has alternative proposals that handle these constraints that are more orthogonal and are pleasant to use, then I'm happy to hear them. I'm just saying that dismissing the feature out of hand is not productive. Patrick Please excuse me, I need some kind of visualization here, so I concocted a simple tree: // So, in pseudo C++, let's imagine a DOM tree struct Element { Element *parent, *prevSib, *nextSib, *firstChild, *lastChild; uint leftPos, topPos, height, width; bool hidden; }; struct Block: Element { BlockProperties blockP; }; struct Div: Block {}; struct Inline: Element { InlineProperties inlineP; }; struct Span: Inline {}; Now, I'll be basically mimicking the way LLVM structures its AST, since the LLVM AST achieves dynamic casting without RTTI. Note that this has a very specific downside: the hierarchy is NOT extensible. // And now in Rust (excuse my poor syntax/errors) enum ElementChild'r { ChildBlock('r Block), ChildInline('r Inline) } struct Element { child: Option'self ElementChild'self; parent: 'self Element; prevSib, nextSib, firstChild, lastChild: Option'self Element; leftPos, topPos, height, width: uint; hidden: bool; } enum BlockChild'r { ChildDiv('r Div) } struct Block { elementBase: Element; child: Option'self BlockChild'self; blockP: BlockProperties; } struct Div { blockBase: Block; } enum InlineChild'r { ChildSpan('r Span) } struct Inline { elementBase: Element; child: Option'self InlineChild'self; inlineP: InlineProperties; } struct Span { inlineBase: Inline; } Let us review our objectives: (1) One word to each DOM element: check = Option'r Element (2) Direct access to a field, without indirection: check = span.inlineBase.elementBase.hidden (3) Downcast and upcasting: check = downcast is done by matching: match(element.child) { ChildBlock('r block) = /* act on block */, ChildInline('r inline) = /* act on inline */); upcast is just accessing the base field. (4) Inheritance with the prefix property = not necessary, (2) is already satisfied. Note on (3): multiple bases are allowed easily, it's one field per base. In order to reduce the foot-print; avoiding having a child field at each level of the hierarchy might be beneficial. In this case, only the final classes are considered in ElementChild enum ElementChild'r { ChildDiv('r Div), ChildSpan('r Span) } And then downcasting to 'r Block is achieved by: match(element.final) { ChildDiv('r div) = Some('r div.blockBase), _ = None } I would note that this does not make use of traits at all; the analysis is only based on Patrick's list of objectives which I guess is incomplete and I was lacking a realistic example so it might not address the full scope of the problem... ... still, for CLOSED hierarchies, the use of traits should not be necessary, although it might be very convenient. -- Matthieu. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Virtual fn is a bad idea
Hi Eric, Coming back on memory; I presented two designs: - in the first one, you have a tag at each level of the hierarchy, which indeed uses more memory for deep hierarchies but means that a type only knows about its immediate children - in the second one, you have a tag only at the root of the hierarchy, which should use exactly as much memory as a v-table pointer (the fact there is no v-table does not matter) Regarding the boilerplate methods, my experience with LLVM is that with virtual the root describes the interface and each descendant implements it whereas in this system the root implements the interface for each and every descendant... This can be alleviated by only dispatching to the immediate descendants (and let them dispatch further) which is more compatible with the memory-heavy design but also means multiple jumps at each call; not nice. However, once the interface is defined, user code should rarely have to go and inspect the hierarchy by itself; this kind of down-casting should be limited, as it is with regular inheritance in other languages. -- Matthieu On Thu, Mar 13, 2014 at 7:49 PM, Eric Summers eric.summ...@me.com wrote: Thinking about this a bit more, maybe the memory cost could go away with tagged pointers. That is easier to do on a 64-bit platform though. Eric On Mar 13, 2014, at 1:37 PM, Eric Summers eric.summ...@me.com wrote: Yes, but with tags you pay the cost even if Option is None. Eric On Mar 13, 2014, at 1:33 PM, Daniel Micay danielmi...@gmail.com wrote: On 13/03/14 02:25 PM, Eric Summers wrote: Also this approach uses more memory. At least a byte per pointer and maybe more with padding. In most cases like this you would prefer to use a vtable instead of tags to reduce the memory footprint. Eric A vtable uses memory too. Either it uses a fat pointer or adds at least one pointer to the object. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] RFC: Opt-in builtin traits
I must admit I really like the *regularity* this brings to Rust. There is nothing more difficult to reason about that an irregular (even if reasonable) interface simply because one must keep all the rules in mind at any time (oh and sorry, there is a special condition described at page 364 that applies to this precise usecase even though the specs sounds like it's a universal rule). Certainly, the annotation could be a burden, but #[deriving(Data)] is extremely terse and brings in almost anything a user could need for its type in one shot. Finally, I believe the public API stability this brings is very necessary. Too often incidental properties are relied upon and broken during updates with the author not realizing it; when it's explicit, at least the library author makes a conscious choice. Maybe one way of preventing completely un-annotated pieces of data would be a lint that just checks that at least one property (Send, Freeze, ...) or a special annotation denoting their absence has been selected for each public-facing type. By having a #[deriving(...)] mandatory, it makes it easier for the lint pass to flag un-marked types without even having to reason whether or not the type would qualify. -- Matthieu On Fri, Feb 28, 2014 at 4:51 PM, Niko Matsakis n...@alum.mit.edu wrote: From http://smallcultfollowing.com/babysteps/blog/2014/02/28/rust-rfc-opt-in-builtin-traits/ : ## Rust RFC: opt-in builtin traits In today's Rust, there are a number of builtin traits (sometimes called kinds): `Send`, `Freeze`, `Share`, and `Pod` (in the future, perhaps `Sized`). These are expressed as traits, but they are quite unlike other traits in certain ways. One way is that they do not have any methods; instead, implementing a trait like `Freeze` indicates that the type has certain properties (defined below). The biggest difference, though, is that these traits are not implemented manually by users. Instead, the compiler decides automatically whether or not a type implements them based on the contents of the type. In this proposal, I argue to change this system and instead have users manually implement the builtin traits for new types that they define. Naturally there would be `#[deriving]` options as well for convenience. The compiler's rules (e.g., that a sendable value cannot reach a non-sendable value) would still be enforced, but at the point where a builtin trait is explicitly implemented, rather than being automatically deduced. There are a couple of reasons to make this change: 1. **Consistency.** All other traits are opt-in, including very common traits like `Eq` and `Clone`. It is somewhat surprising that the builtin traits act differently. 2. **API Stability.** The builtin traits that are implemented by a type are really part of its public API, but unlike other similar things they are not declared. This means that seemingly innocent changes to the definition of a type can easily break downstream users. For example, imagine a type that changes from POD to non-POD -- suddenly, all references to instances of that type go from copies to moves. Similarly, a type that goes from sendable to non-sendable can no longer be used as a message. By opting in to being POD (or sendable, etc), library authors make explicit what properties they expect to maintain, and which they do not. 3. **Pedagogy.** Many users find the distinction between pod types (which copy) and linear types (which move) to be surprising. Making pod-ness opt-in would help to ease this confusion. 4. **Safety and correctness.** In the presence of unsafe code, compiler inference is unsound, and it is unfortunate that users must remember to opt out from inapplicable kinds. There are also concerns about future compatibility. Even in safe code, it can also be useful to impose additional usage constriants beyond those strictly required for type soundness. I will first cover the existing builtin traits and define what they are used for. I will then explain each of the above reasons in more detail. Finally, I'll give some syntax examples. !-- more -- The builtin traits We currently define the following builtin traits: - `Send` -- a type that deeply owns all its contents. (Examples: `int`, `~int`, not `int`) - `Freeze` -- a type which is deeply immutable when accessed via an `T` reference. (Examples: `int`, `~int`, `int`, `mut int`, not `Cellint` or `Atomicint`) - `Pod` -- plain old data which can be safely copied via memcpy. (Examples: `int`, `int`, not `~int` or `mut int`) We are in the process of adding an additional trait: - `Share` -- a type which is threadsafe when accessed via an `T` reference. (Examples: `int`, `~int`, `int`, `mut int`, `Atomicint`, not `Cellint`) Proposed syntax Under this proposal, for a struct or enum to be considered send, freeze, pod, etc, those traits must be explicitly
Re: [rust-dev] Fwd: user input
On Sun, Feb 9, 2014 at 12:15 PM, Renato Lenzi rex...@gmail.com wrote: Always talking about read write i noticed another interesting thing: use std::io::buffered::BufferedReader; use std::io::stdin; fn main() { print!(Insert your name: ); let mut stdin = BufferedReader::new(stdin()); let s1 = stdin.read_line().unwrap_or(~nothing); print!(Welcome, {}, s1); } when i run this simple code the output Insert your name doesn't appear on the screen... only after typing and entering a string the whole output jumps out... am i missing some flush (ala Fantom) or similar? I am using Rust 0.9 on W7. Ah, that's interesting. In most languages whenever you ask for user input (read on stdin) it automatically triggers a flush on stdout and stderr to avoid this uncomfortable situation. I suppose it would not be took difficult to incorporate this in Rust. -- Matthieu. On Sun, Feb 9, 2014 at 2:40 AM, Patrick Walton pcwal...@mozilla.comwrote: On 2/8/14 3:35 PM, Alex Crichton wrote: We do indeed want to make common tasks like this fairly lightweight, but we also strive to require that the program handle possible error cases. Currently, the code you have shows well what one would expect when reading a line of input. On today's master, you might be able to shorten it slightly to: use std::io::{stdin, BufferedReader}; fn main() { let mut stdin = BufferedReader::new(stdin()); for line in stdin.lines() { println!({}, line); } } I'm curious thought what you think is the heavy/verbose aspects of this? I like common patterns having shortcuts here and there! Is there any way we can get rid of the need to create a buffered reader? It feels too enterprisey. Patrick ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Using Default Type Parameters
On Mon, Feb 3, 2014 at 8:41 AM, Gábor Lehel glaebho...@gmail.com wrote: On Mon, Feb 3, 2014 at 7:55 AM, Corey Richardson co...@octayn.net wrote: Default typarams are awesome, but they're gated, and there's some concern that they'll interact unpleasantly with extensions to the type system (most specifically, I've seen concern raised around HKT, where there is conflicting tension about whether to put the defaults at the start or end of the typaram list). Just for reference, this was discussed here: https://github.com/mozilla/rust/pull/11217 (The tension is essentially that with default type args you want to put the least important types at the end, so they can be defaulted, while with HKT you want to put them at the front, so they don't get in the way of abstracting over the important ones.) Thinking out loud: could parameters be keyed, like named functions arguments ? If they were, then their position would matter little. -- Matthieu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Proposal: Change Parametric Polymorphism Declaration Syntax
On Sun, Feb 2, 2014 at 6:08 PM, Benjamin Striegel ben.strie...@gmail.comwrote: After sleeping on it I'm not convinced that this would be a net improvement over our current situation. With a few caveats I'm really rather happy with the syntax as it is. On Sun, Feb 2, 2014 at 8:55 AM, Jason Fager jfa...@gmail.com wrote: I'm not a huge fan of this proposal. It makes declarations longer, and it removes the visual consistency of FooT,U everywhere, which I think introduces its own pedagogical issue. The recent addition of default type parameters, though, makes me think there's a reasonable change that increases consistency and shortens declarations in a few common cases. From what I understand, the reason we can't just have impl TraitT for FooT,U is because it's ambiguous whether T and U are intended to be concrete or generic type names; i.e., implT TraitT for FooT,U tells the compiler that we expect U to be a concrete type name. Our new default type parameter declarations look like: struct FooT,U=Bar So what if to actually make generic types concrete, we always used the '='? struct FooT,U=Bar impl TraitT for FooT, U=Derp This saves a character over 'implT TraitT for FooT, Derp', solves the greppability problem, and makes intuitive sense given how defaults are declared. It also has a nice parallel with how ':' is used - ':' adds restrictions, '=' fully locks in place. So what is today something like implT:Ord TraitT for FooT, Derp would become impl TraitT:Ord for FooT, U=Derp The rule would be that the first use of a type variable T would introduce its bounds, so for instance: impl TraitT:Ord for FooZ:Clone, U=Derp would be fine, and impl TraitT for FooT:Clone, U=Derp would be an error. More nice fallout: struct FooA,B impl FooA,B=Bar { fn one(a: A) - B fn two(a: A) - B fn three(a: A) - B } means that if I ever want to go back and change the name of Bar, I only have to do it in one place, or if Bar is actually some complicated type, I only had to write it once, like a little local typedef. I'm sure this has some glaring obvious flaw I'm not thinking of. It would be nice to have less syntax for these declarations, but honestly I'm ok with how it is now. On Sat, Feb 1, 2014 at 5:39 PM, Corey Richardson co...@octayn.netwrote: Hey all, bjz and I have worked out a nice proposal[0] for a slight syntax change, reproduced here. It is a breaking change to the syntax, but it is one that I think brings many benefits. Summary === Change the following syntax: ``` struct FooT, U { ... } implT, U TraitT for FooT, U { ... } fn fooT, U(...) { ... } ``` to: ``` forallT, U struct Foo { ... } forallT, U impl TraitT for FooT, U { ... } forallT, U fn foo(...) { ... } ``` From a readability point of view, I am afraid this might be awkward though. Coming from a C++, I have welcome the switch from `typedef` to `using` (aliases) because of alignment issues; consider: typedef std::mapint, std::string MapType; typedef std::vectorstd::pairint, std::string VectorType; vs using MapType = std::mapint, std::string; using VectorType = std::vectorstd::pairint, std::string; In the latter, the entities being declared are at a constant offset from the left-hand margin; and close too; whereas in the former, the eyes are strained as they keep looking for what is declared. And now, let's look at your proposal: fn foo(a: int, b: int) - int { } fn fooT, U(a: T, b: U) - T { } forallT, U fn foo(a: T, b: U) - T { } See how forall causes a bump that forces you to start looking where that name is ? It was so smooth until then ! So, it might be a net win in terms of grep-ability, but to be honest it seems LESS readable to me. -- Matthieu The Problem === The immediate, and most pragmatic, problem is that in today's Rust one cannot easily search for implementations of a trait. Why? `grep 'impl Clone'` is itself not sufficient, since many types have parametric polymorphism. Now I need to come up with some sort of regex that can handle this. An easy first-attempt is `grep 'impl(.*?)? Clone'` but that is quite inconvenient to type and remember. (Here I ignore the issue of tooling, as I do not find the argument of But a tool can do it! valid in language design.) A deeper, more pedagogical problem, is the mismatch between how `struct Foo... { ... }` is read and how it is actually treated. The straightforward, left-to-right reading says There is a struct Foo which, given the types ... has the members This might lead one to believe that `Foo` is a single type, but it is not. `Fooint` (that is, type `Foo` instantiated with type `int`) is not the same type as `Foounit` (that is, type `Foo` instantiated with type `uint`). Of course, with a small amount of experience or a very simple explanation, that becomes obvious. Something less obvious is the treatment of functions. What
Re: [rust-dev] What of semi-automated segmented stacks ?
On Thu, Jan 30, 2014 at 6:33 PM, Daniel Micay danielmi...@gmail.com wrote: On Thu, Jan 30, 2014 at 12:27 PM, Matthieu Monrocq matthieu.monr...@gmail.com wrote: Hello, Segmented stacks were ditched because of performance issues that were never fully resolved, especially when every opaque call (C, ...) required allocated a large stack up-front. Still, there are platforms (FreeBSD) with small stacks where the idea of segmented tasks could ease development... so what if we let the developer ship in ? Rust can and does choose the stack size itself. This can exposed as an API feature too. I think it would be a good idea, to avoid platform defaults causing unexpected crashes. I know Clang regularly suffers on a number of tests because of this. Still, this seems complementary. Whilst a large stack to begin with is an obvious option, there are always unfavorable cases. Today, to avoid stack issues, I have to move from natural recursive style to self-managed stack of actions and an endless loop so my stack is actually on the heap. It's feasible, certainly, but it's a technical limitation getting in the way of my intent. And unfortunately, whilst I could allocate a 1GB stack to start with (64 bits world sure is fortunate), I have no way to foresee when I will need such a stack and when I do not. Dynamic adaptation makes things much easier. The idea of semi-automated segmented stacks would be: - to expose to the user how many bytes worth of stack are remaining - to let the user trigger a stack switch This system should keep the penalty close to null for those who do not care, and be relatively orthogonal to the rest of the implementation: If Rust isn't going to be using the segmented stack prelude (1-5% performance hit), it needs guard pages. This means the smallest stack segment size you can have with a free solution is 8K. It will consume less virtual memory than a fixed-size stack, but not more physical memory. - how many bytes remaining carries little to no penalty: just a pointed substraction between the current stack pointer and the end-of-stack pointer (which can be set once and for all at thread start-up) - the stack switch is voluntary, and can include a prelude on the new stack that automatically comes back to its parent so that most code should not care, no penalty in regular execution (without it) - I foresee some potential implementation difficulty for the unwinder, did it ever work on segmented stacks ? Was it difficult/slow ? Does performance of unwind matter that much ? Unwind performance doesn't matter, and is already really slow by design. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Today's Rust contribution ideas
On Mon, Jan 27, 2014 at 11:41 PM, Sebastian Sylvan sebastian.syl...@gmail.com wrote: On Mon, Jan 27, 2014 at 9:33 AM, Matthieu Monrocq matthieu.monr...@gmail.com wrote: On Mon, Jan 27, 2014 at 3:39 AM, Brian Anderson bander...@mozilla.comwrote: Consensus is that the `do` keyword is no longer pulling its weight. Remove all uses of it, then remove support from the compiler. This is a 1.0 issue. # Experiment with faster hash maps (#11783) Rust's HashMap uses a cryptographically secure hash, and at least partly as a result of that it is quite slow. HashMap continues to show up very, very high in performance profiles of a variety of code. It's not clear what the solution to this is, but it is clear that - at least sometimes - we need a much faster hash map solution. Figure out how to create faster hash maps in Rust, potentially sacrificing some amount of DOS-resistance by using weaker hash functions. This is fairly open-ended and researchy, but a solution to this could have a big impact on the performance of rustc and other projects. You might be interested by a serie of articles by Joaquín M López Muñoz who maintains the Boost.MultiIndex library. He did a relatively comprehensive overview of the hash-maps implementation of Dirkumware (MSVC), libstdc++ and libc++ on top of Boost.MultiIndex, and a lot of benchmarks showing the performance for insertion/removal/search in a variety of setup. One of the last articles: http://bannalia.blogspot.fr/2014/01/a-better-hash-table-clang.html Let me also plug this blog post from a while back: http://sebastiansylvan.com/2013/05/08/robin-hood-hashing-should-be-your-default-hash-table-implementation/. There's also a followup on improving deletions*, which makes the final form the fastest hash map I know of. It's also compact (95% load factor, 32 bits overhead per element, but you can reduce that to 2 bits per element if you sacrifice some perf.), and doesn't allocate (other than doubling the size of the table when you hit the load factor). For a benchmark with lots of std::strings it was 23%, 66% and 25% faster for insertions deletions and lookups (compared to MSVC unordered_map), it also uses 30% less memory in that case. Seb * the basic form has an issue where repeated deletes gradually increases the probe count. In pathological cases this can reduce performance by a lot. The fix is to incrementally fix up the table on each delete (you could also do it in batch every now and then). It's still faster in all cases, and the probe-length as well as probe-length-variance remains low even in the most pathological circumstances. Thanks for the link, I should have mentioned that the C++ Standard version is constrained by a memory stability requirement which may or may not apply to Rust (thanks to borrow checks, it should be possible to know statically whether an element is borrowed or not). This memory stability requirement as well as some other requirements such as relative stability of items within the same equivalence class during insert/erase several constrain the design; and indeed if the requirements can be lifted it the designs proposed on bannalia will be suboptimal. -- Matthieu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Today's Rust contribution ideas
On Mon, Jan 27, 2014 at 3:39 AM, Brian Anderson bander...@mozilla.comwrote: People interested in Rust are often looking for ways to have a greater impact on its development, and while the issue tracker lists lots of stuff that one *could* work on, it's not always clear what one *should* work on. There is consistently an overwhelming number of very important tasks to do which nobody is tackling, so this is an effort to update folks on what high-impact, yet accessible, contribution opportunities are available. These are of varying difficulty, but progress on any of them is worthy of *extreme kudos*. # Break up libextra (#8784) Getting our library ecosystem in shape in critical for Rust 1.0. We want Rust to be a batteries included language, distributed with many crates for common uses, but the way our libraries are organized - everything divided between std and extra - has long been very unsatisfactory. libextra needs to be split up into a number of subject-specific crates, setting the precedent for future expansion of the standard libraries, and with the impending merging of #11787 the floodgates can be opened. This is simply a matter of identifing which modules in extra logically belong in their own libraries, extracting them to a directory in src/, and adding a minimal amount of boilerplate to the makefiles. Multiple people can work on this, coordinating on the issue tracker. # Improve the official cheatsheet We have the beginnings of a 'cheatsheet', documenting various common patterns in Rust code (http://static.rust-lang.org/doc/master/complement- cheatsheet.html), but there is so much more that could be here. This style of documentation is hugely useful for newcomers. There are a few ways to approach this: simply review the current document, editing and augmenting the existing examples; think of the questions you had about Rust when you started and add them; solicit questions (and answers!) from the broader community and them; finally, organize a doc sprint with several people to make some quick improvements over a few hours. # Implement the `Share` kind (#11781) Future concurrency code is going to need to reason about types that can be shared across threads. The canonical example is fork/join concurrency using a shared closure, where the closure environment is bounded by `Share`. We have the `Freeze` kind which covers a limited version of this use case, but it's not sufficient, and may end up completely supplanted by `Share`. This is quite important to have sorted out for 1.0 but the design is not done yet. Work with other developers to figure out the design, then once that's done the implementation - while involving a fair bit of compiler hacking and library modifications - should be relatively easy. # Remove `do` (#10815) Consensus is that the `do` keyword is no longer pulling its weight. Remove all uses of it, then remove support from the compiler. This is a 1.0 issue. # Experiment with faster hash maps (#11783) Rust's HashMap uses a cryptographically secure hash, and at least partly as a result of that it is quite slow. HashMap continues to show up very, very high in performance profiles of a variety of code. It's not clear what the solution to this is, but it is clear that - at least sometimes - we need a much faster hash map solution. Figure out how to create faster hash maps in Rust, potentially sacrificing some amount of DOS-resistance by using weaker hash functions. This is fairly open-ended and researchy, but a solution to this could have a big impact on the performance of rustc and other projects. You might be interested by a serie of articles by Joaquín M López Muñoz who maintains the Boost.MultiIndex library. He did a relatively comprehensive overview of the hash-maps implementation of Dirkumware (MSVC), libstdc++ and libc++ on top of Boost.MultiIndex, and a lot of benchmarks showing the performance for insertion/removal/search in a variety of setup. One of the last articles: http://bannalia.blogspot.fr/2014/01/a-better-hash-table-clang.html # Replace 'extern mod' with 'extern crate' (#9880) Using 'extern mod' as the syntax for linking to another crate has long been a bit cringeworthy. The consensus here is to simply rename it to `extern crate`. This is a fairly easy change that involves adding `crate` as a keyword, modifying the parser to parse the new syntax, then changing all uses, either after a snapshot or using conditional compilation. This is a 1.0 issue. # Introduce a design FAQ to the official docs (#4047) There are many questions about languages' design asked repeatedly, so they tend to have documents simply explaining the rationale for various decisions. Particularly as we approach 1.0 we'll want a place to point newcomers to when these questions are asked. The issue on the bug tracker already contains quite a lot of questions, and some answers as well. Add a new Markdown file to the doc/
Re: [rust-dev] Appeal for CORRECT, capable, future-proof math, pre-1.0
On Tue, Jan 14, 2014 at 5:56 AM, comex com...@gmail.com wrote: On Mon, Jan 13, 2014 at 4:06 PM, Tobias Müller trop...@bluewin.ch wrote: intl1,u1 + intl2,u2 = intl1+l2,u1+u2 ... If the result does not fit into an int the compiler throws an error. To resolve an error, you can: - annotate the operands with appropriate bounds - use a bigger type for the operation and check the result. I remember wondering whether this type of solution would be feasible or too much of a hassle in practice. As I see it, many values which might be arithmetic operands are sizes or counts, and really ought to be size_t sized, and any mutable variable which is operated on in a loop can't be bounded with a lot more complexity, so it might lean toward the latter. It's indeed a risk that such an annotation might be too annoying (especially since addition is actually quite easy, the bounds grow faster on multiplication)... but on the other hand, you do need dynamic checks anyway to verify that the value of type u320, 4_294_967_295 won't overflow if you multiply it by 3. So as I see it, you can do either of: let result = tou320, 1_431_655_765(size) * 3; OR let result = tou32(tou64(size) * 3);. Of course, compared to let result = size * 3; it seems the annotation tax is high, however the latter may overflow (and wrap, certainly, but that is still a bogus answer in most languages). So, maybe it one could just use a couple primitives: - wrapping integers (for hashes) - saturating integers (useful for colors) - fail-on-overflow integers - compile-time range-checked integers u32w, u32s, u32o and u32c ? Note: as far as I know Rust *plans* on having non-type template parameters but does not have them yet, so the compile-time range-checked integers are out of question for now. Note 2: having all those in the core language would be unnecessary if the syntax 3u32c (numbertype) was sugar coating for u32c::new(3) like C++ suffix literals; with new using some default integer type (I vote for the fail-on-overflow, it catches the bugs) and the compiler verifying that the raw number can be expressed in that default integer type perfectly. Then libraries could add the other modes. -- Matthieu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Unbounded channels: Good idea/bad idea?
On Tue, Dec 31, 2013 at 6:16 AM, Patrick Walton pcwal...@mozilla.comwrote: Can someone address Simon Marlow's point here? https://plus.google.com/10955911385859313/posts/FAmNTExSLtz unbuffered channels are synchronous in the sense that both reader and writer must be ready at the same time. It's easy to deadlock if you're not careful. Buffered channels allow asynchronous writes, but only up to the buffer size, so that doesn't actually make things easier. Fully asynchronous channels, like you get in Erlang and Haskell don't have this problem, but they are unbounded so you have to be careful about filling them up (Erlang uses a clever scheduling trick to mitigate that problem, though). I am concerned that we are only hearing one side of the argument here, and Haskell folks seem to have come down fairly strongly in favor of unbounded channels. It also seems to me that the argument is partial, and only consider blocking sends. It would be interesting to know if they envisaged non-blocking sends and if so why they seem to have been discarded. To reiterate: At this point I believe we should have both as first-class citizens, like `java.util.concurrent`. Choosing one or the other seems to be neglecting too many use cases. Patrick ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Unbounded channels: Good idea/bad idea?
On Tue, Dec 31, 2013 at 6:46 PM, Patrick Walton pcwal...@mozilla.comwrote: On 12/30/13 8:46 PM, Christian Ohler wrote: To address the last sentence – bounded channels with default size 0 _do_ minimize the fallout of this design: The program would reliably deadlock every time it is tested with a nonzero number of images, since A will try to write to Images while B is blocked receiving from Done, not listening on Images yet. I don't see this deadlock as a nasty hazard – the code wouldn't work at all, and the programmer would immediately notice. If the programmer uses a non-zero buffer size for the channel, it's a magic number that they came up with, so they should know to test inputs around that magnitude. I suspect a lot of programmers in systems with bounded channels just come up with some round number (like 10) and forget about it. Similar to the argument to listen(2)... Patrick Anecdotal evidence: I work with distributed systems, and most of our limits are in fact completely winged and rarely if ever touched... except after an issue where we realize we could do better. This is the kind of things where you don't have enough experience with the system as you first write it, so you put some reasonable limits, and then just forget that you needed to come back it and check if it really worked... but then, on the other hand, if it passes the testing isn't it that it works well enough ? ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Unbounded channels: Good idea/bad idea?
On Thu, Dec 19, 2013 at 7:23 PM, Kevin Ballard ke...@sb.org wrote: Here’s an example from where I use an infinite queue. I have an IRC bot, written in Go. The incoming network traffic of this bot is handled in one goroutine, which parses each line into its components, and enqueues the result on a channel. The channel is very deliberately made infinite (via a separate goroutine that stores the infinite buffer in a local slice). The reason it’s infinite is because the bot needs to be resilient against the case where either the consumer unexpectedly blocks, or the network traffic spikes. The general assumption is that, under normal conditions, the consumer will always be able to keep up with the producer (as the producer is based on network traffic and not e.g. a tight CPU loop generating messages as fast as possible). Backpressure makes no sense here, as you cannot put backpressure on the network short of letting the socket buffer fill up, and letting the socket buffer fill up with cause the IRC network to disconnect you. So the overriding goal here is to prevent network disconnects, while assuming that the consumer will be able to catch up if it ever gets behind. This particular use case very explicitly wants a dynamically-sized infinite channel. I suppose an absurdly large channel would be acceptable, because if the consumer ever gets e.g. 100,000 lines behind then it’s in trouble already, but I’d rather not have the memory overhead of a statically-allocated gigantic channel buffer. I feel the need to point out that the producer could locally queue the messages before sending over the channel if it were bounded. -Kevin On Dec 19, 2013, at 10:04 AM, Jason Fager jfa...@gmail.com wrote: Okay, parallelism, of course, and I'm sure others. Bad use of the word 'only'. The point is that if your consumers aren't keeping up with your producers, you're screwed anyways, and growing the queue indefinitely isn't a way to get around that. Growing queues should only serve specific purposes and make it easy to apply back pressure when the assumptions behind those purposes go awry. On Thursday, December 19, 2013, Patrick Walton wrote: On 12/19/13 6:31 AM, Jason Fager wrote: I work on a system that handles 10s of billions of events per day, and we do a lot of queueing. Big +1 on having bounded queues. Unbounded in-memory queues aren't, they just have a bound you have no direct control over and that blows up the world when its hit. The only reason to have a queue size greater than 1 is to handle spikes in the producer, short outages in the consumer, or a bit of out-of-phaseness between producers and consumers. Well, also parallelism. Patrick ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Unbounded channels: Good idea/bad idea?
Also working in a distributed system, I cannot emphasize enough how back pressure is essential. With back pressure, you offer the producer a chance to react: it can decide to drop the message, send it over another channel, keep it around for later, etc... Furthermore, it is relatively easy to build an unbounded channel over a bounded one: just have the producer queue things. Depending on whether sequencing from multiple producers is important or not, this queue can be either shared or producer-local, with relative ease. Regarding the various behaviors that may be implemented, most behaviors can actually be implemented outside of the channel implementation: + dropping the message can be implemented on producer side: if it cannot queue, it just goes on + crashing is similar: if it cannot queue, crash + blocking is generally a good idea, but if a timed-wait primitive exists then I imagine an infinite (or close enough) duration would be sufficient So it might be more interesting to reason in terms of primitives, and those might be more methods than types (hopefully): (1) immediate queueing (returning an error), a special case of time-bound queueing which may be slightly more efficient (2) time-bound queueing (returning an error after the timeout) (3) immediate + exchange with head (in which case the producer also locally acts as a consumer, this might be tricky to pull off efficiently on Single Consumer queues) (4) immediate + atomic subscription to place has been freed event in case of full-queue (Note: (4) somehow implies a dual channel, if you have a MPSC a back-channel SPMC is created to dispatch the space available notifications... which can be a simple counter, obviously; this back-channel must be select-able so that producers that usually block on other stuff can use a space available event to unblock) I cannot see another interesting primitive, at the moment. -- Matthieu On Thu, Dec 19, 2013 at 7:25 PM, Matthieu Monrocq matthieu.monr...@gmail.com wrote: On Thu, Dec 19, 2013 at 7:23 PM, Kevin Ballard ke...@sb.org wrote: Here’s an example from where I use an infinite queue. I have an IRC bot, written in Go. The incoming network traffic of this bot is handled in one goroutine, which parses each line into its components, and enqueues the result on a channel. The channel is very deliberately made infinite (via a separate goroutine that stores the infinite buffer in a local slice). The reason it’s infinite is because the bot needs to be resilient against the case where either the consumer unexpectedly blocks, or the network traffic spikes. The general assumption is that, under normal conditions, the consumer will always be able to keep up with the producer (as the producer is based on network traffic and not e.g. a tight CPU loop generating messages as fast as possible). Backpressure makes no sense here, as you cannot put backpressure on the network short of letting the socket buffer fill up, and letting the socket buffer fill up with cause the IRC network to disconnect you. So the overriding goal here is to prevent network disconnects, while assuming that the consumer will be able to catch up if it ever gets behind. This particular use case very explicitly wants a dynamically-sized infinite channel. I suppose an absurdly large channel would be acceptable, because if the consumer ever gets e.g. 100,000 lines behind then it’s in trouble already, but I’d rather not have the memory overhead of a statically-allocated gigantic channel buffer. I feel the need to point out that the producer could locally queue the messages before sending over the channel if it were bounded. -Kevin On Dec 19, 2013, at 10:04 AM, Jason Fager jfa...@gmail.com wrote: Okay, parallelism, of course, and I'm sure others. Bad use of the word 'only'. The point is that if your consumers aren't keeping up with your producers, you're screwed anyways, and growing the queue indefinitely isn't a way to get around that. Growing queues should only serve specific purposes and make it easy to apply back pressure when the assumptions behind those purposes go awry. On Thursday, December 19, 2013, Patrick Walton wrote: On 12/19/13 6:31 AM, Jason Fager wrote: I work on a system that handles 10s of billions of events per day, and we do a lot of queueing. Big +1 on having bounded queues. Unbounded in-memory queues aren't, they just have a bound you have no direct control over and that blows up the world when its hit. The only reason to have a queue size greater than 1 is to handle spikes in the producer, short outages in the consumer, or a bit of out-of-phaseness between producers and consumers. Well, also parallelism. Patrick ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https
Re: [rust-dev] Idea for versioned language specifications with automatic conversions
Hi Manuel, I must say that on a conceptual point of view I like the approach, keeping one's libraries up to date is the only way to go, however I am afraid that you are glossing over certain details here: - you assume that the source code is available, this is a problem if I am using a 3rd party library for which I only get the binary and THEY have not migrated yet; how can I use library X (released in 0.9 and 0.10) and library Y (released in 0.11 and 0.12) in the same project ? Smaller milestones make it a smoother process to upgrade at the individual level but larger milestones help multiple people/corporations coordinate. - you assume that I can actually upgrade; I work at a large software company, with over 5,000 employees now, and this apply a *large* source code base. A migration entails an extensive test phase of the target software/version following by a careful migration of a few pilot products simply because migrating costs a lot and migrating to a flawed version just to rollback the migration is a cost sink. As a result though, this creates inertia. Internally we are *always* in the middle of several migrations (compiler, 3rd party libraries, in-house middleware, ...) and the larger ones take years. Because of this, once again we need some coordination: we just cannot afford to migrate every 6 months (not enough testing time). This means that while it would not prevent Rust from migrating every 6 months, we would still be expecting fixes to previous releases for a year or two. The former means that 6 months might a little *too* fast pace for industrial projects, the latter means that on top of defining releases schedule the Rust team will also have to provide a clear plan for support of older versions (how long, what kind of bugs, ...) and the number of branches impacted may grow quickly: 6 months releases + 2 years support means at least 4 branches, maybe 5 if we count the one being developed (and 2 years is nothing fancy, as support goes). -- Matthieu On Sun, Nov 24, 2013 at 11:49 AM, Manuel ma.adam...@gmail.com wrote: I had the following idea to approach language evolution: Problem: Languages try to be backward compatible by stabilizing, and only slowly deprecating old features. This results in a language which does not evolve. Some different takes about this: C++: adds new features but does not fix problems, and often does not remove obsolete features resulting in, well, C++. Python: Minor versions which add new features, big version jump from 2 to 3 to make backward incompatible changes. The resulting incompatibility was a big problem, almost 5 years after the release of 3.0 (December 3rd, 2008) people are still using 2.x. Rust seems to follow a similar approach, devs are already defering features to 2.0 to stabilize. Other languages simply do not evolve at all and are replaced. My idea to improve this situation would be to add a version tag in every main crate, something like #ver 0.10. For each version jump the compiler would fix the code automatically, and convert it to the current language specification. When the library/code is multiple versions behind the conversions could be applied successively. This can be done in a lot of cases, see python 2to3 script and even Google did this for go with the tool gofix during development. With this change not updated libraries would still be usable in rust. To simplify updating libraries the compiler could, on demand, print out a report of problematic parts and propose fixes. Some things can not be fixed with an automatic approach, for these cases a classic deprecation mechanism or something else could still be used. Advantages: Kind of backward compatibility to old code bases. Rust can evolve and stay streamlined at the same time. Compiler does not have to deal with deprecation mechanism, because you can remove, and change things instantly. When this would be in place i think it would be best to release incompatible updates often, but with only a few changes. Every six months for example. What do you think about this? Manuel ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Faster communication between tasks
On Sat, Nov 9, 2013 at 8:13 PM, Simon Ruggier simo...@gmail.com wrote: Hi all, I've tentatively come up with a design that would allow the sender to reallocate the buffer as necessary, with very little added performance cost. The sending side would bear the cost of reallocation, and there would be an extra test that receivers would have to make every time they process an item (no extra atomic operations needed). However, it may be a few weeks or more before I have a working implementation to demonstrate, so I figured it might be worthwhile to mention now that I'll be working on this. Also, I think it would be interesting to investigate doing something like the Linux kernel's deadlock detection[1], but generalized to apply to bounded queues, and implemented as a static check. I know little about this, but even so, I can see how it would be an enormous amount of work. On the other hand, I would have thought the same thing about the memory safety rules that Rust enforces. I'm hopeful that this will eventually be possible as well. [1] https://www.kernel.org/doc/Documentation/lockdep-design.txt A static proof seems extremely difficult; it would be a significant addition to the type system, affecting the closure types (did they, or not, embedded a channel/port at creation ?). In addition, I am unsure of how transfer of closures through channels would pan out. On the other hand, dynamic detection (such as done on @ pointers for mutability), seems possible. -- Matthieu On Wed, Oct 30, 2013 at 12:55 AM, Simon Ruggier simo...@gmail.com wrote: On Tue, Oct 29, 2013 at 3:30 PM, Brian Anderson bander...@mozilla.comwrote: On 10/28/2013 10:02 PM, Simon Ruggier wrote: Greetings fellow Rustians! First of all, thanks for working on such a great language. I really like the clean syntax, increased safety, separation of data from function definitions, and freedom from having to declare duplicate method prototypes in header files. I've been working on an alternate way to communicate between tasks in Rust, following the same approach as the LMAX Disruptor.[1] I'm hoping to eventually offer a superset of the functionality in the pipes API, and replace them as the default communication mechanism between tasks. Just as with concurrency in general, my main motivation in implementing this is to improve performance. For more information about the disruptor approach, there's a lot of information linked from their home page, in a variety of formats. This is really exciting work. Thanks for pursuing it. I've been interested in exploring something like Disruptor in Rust. The current channel types in Rust are indeed slow, and fixing them is the topic of https://github.com/mozilla/rust/issues/8568. I'll start paying attention to that. The Morrison Afek 2013 paper looks like something I should read. This is my first major contribution of new functionality to an open-source project, so I didn't want to discuss it in advance until I had a working system to demonstrate. I currently have a very basic proof of concept that achieves almost two orders of magnitude better performance than the pipes API. On my hardware[2], I currently see throughput of about 27 million items per second when synchronizing with a double-checked wait condition protocol between sender and receivers, 80+ million items with no blocking (i.e. busy waiting), and anywhere from 240,000 to 600,000 when using pipes. The LMAX Disruptor library gets up to 110 million items per second on the same hardware (using busy waiting and yielding), so there's definitely still room for significant improvement. Those are awesome results! Thanks! When I first brought it up, it was getting about 14 million with the busy waiting. Minimizing the number of atomic operations (even with relaxed memory ordering) makes a big difference in performance. The 2/3 drop in performance with the blocking wait strategy comes from merely doing a read-modify-write operation on every send (it currently uses atomic swap, I haven't experimented with others yet). To be fair, the only result I can take credit for is the blocking algorithm. The other ideas are straight from the original disruptor. I've put the code up on GitHub (I'm using rustc from master).[3] Currently, single and multi-stage pipelines of receivers are supported, while many features are missing, like multiple concurrent senders, multiple concurrent receivers, or mutation of the items as they pass through the pipeline. However, given what I have so far, now is probably the right time to start soliciting feedback and advice. I'm looking for review, suggestions/constructive criticism, and guidance about contributing this to the Rust codebase. I'm not deeply familiar with Disruptor, but I believe that it uses bounded queues. My general feeling thus far is that, as the general 'go-to' channel type, people should not be using bounded queues that block the
Re: [rust-dev] Stack management in SpiderMonkey or aborting on stack overflow could be OK.
I really like the idea of a task being a sandbox (if pure/no-unsafe Rust). It seems (relatively) easy for a task to keep count of the number of bytes it allocated (or the number of blocks), both heap-allocated and stack-allocated blocks could be meshed together there (after all, both consume memory), and this single count would address both (1) and (3) at once. Regarding the intrinsic to extend the stack, it seems nice, and in fact generalizable. It looks to me like a coroutine in the same memory space, compared to a task being a coroutine in a different memory space. Maybe some unification is possible here ? -- Matthieu On Wed, Oct 30, 2013 at 3:17 AM, Niko Matsakis n...@alum.mit.edu wrote: I certainly like the idea of exposing a low stack check to the user so that they can do better recovery. I also like the idea of `call_with_new_stack`. I am not sure if this means that the default recovery should be *abort* vs *task failure* (which is already fairly drastic). But I guess it is a legitimate question: to what extent should we permit safe rust code to bring a system to its knees? We can't truly execute untrusted code, since it could invoke native things or include unsafe blocks, but it'd be nice if we could give some guarantees as to the limits of what safe code can do. Put differently, it'd be nice if tasks could serve as an effective sandbox for *safe code*. It seems to me that the main way that safe code can cause problems for a larger system are (1) allocating too much heap; (2) looping infinitely; and (3) over-recursing. But no doubt there are more. Maybe it doesn't make sense to address only one problem and not the others; on the other hand, we should not let the perfect be the enemy of the good, and perhaps we can find ways to address the others as well (e.g., hard limits on total memory a task can ever allocate; leveraging different O/S threads for pre-emption and killing, etc). Niko On Tue, Oct 29, 2013 at 11:51:10PM +0100, Igor Bukanov wrote: SpiderMonkey uses recursive algorithms in quite a few places. As the level of recursion is at mercy of JS code, checking for stack exhaustion is a must. For that the code explicitly compare an address of a local variable with a limit set as a part of thread initialization. If the limit is breached, the code either reports failure to the caller (parser, interpreter, JITed code) or tries to recover using a different algorithm (marking phase of GC). This explicit strategy allowed to archive stack safety with relatively infrequent stack checks compared with the total number of function calls in the code. Granted, without statick analysis this is fragile as missing stack check on a code path that is under control of JS could be potentially exploitable (this is C++ code after all), but it has being working. So I think aborting on stack overflow in Rust should be OK as it removes security implications from a stack overflow bugs. However, it is a must then to provide facilities to check for a low stack. It would also be very useful to have an option to call code with a newly allocated stack of the given size without creating any extra thread etc. This would allow for a pattern like: fn part_of_recursive_parser ... { if stack_low() { call_with_new_stack(10*1024*1024, part_of_recursive_parser) } } Then missing stack_low() becomes just a bug without security implications. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Fwd: Faster communication between tasks
If I may suggest, rather than blocking the sender in case the channel is full, I suggest simply returning an error (or raising a condition) immediately. This is both extremely simple (for the channel implementer) and heavily customizable (for the user). It seems certainly much easier than provide an extremely wide array of different channels as part of the core Rust distribution... and actually makes it possible to build libraries for common cases (such as local queuing). -- Matthieu. On Wed, Oct 30, 2013 at 6:37 AM, Ben Kloosterman bkloo...@gmail.com wrote: Simon 1 thing you may want to test is 10-20 senders to 1 reciever. Multiple senders have completely diffirent behaviour and can create a lot of contention around locks / interlocked calls . Also checks what happens to CPU when the receiver blocks for 100ms disk accesses every 100ms. Disruptor as used by Lmax normally uses very few senders / receivers and the main/busy threads do no IO. Ben On Wed, Oct 30, 2013 at 1:03 PM, Simon Ruggier simo...@gmail.com wrote: See my first message, I tested the throughput of the pipes API, it is far slower. Synchronization between sender and receiver depends on which wait strategy is used. There is a strategy that blocks indefinitely if no new items are sent. To see how it works, look at this comment: https://github.com/sruggier/rust-disruptor/blob/7cbc2fababa087d0bc116a8a739cbb759354388b/disruptor.rs#L762 Multiple senders are also on my roadmap. Some things just aren't testable, because the memory ordering guarantees depend on the hardware you're running on. For it to be truly correct and portable, the source code has to be simple enough for a reviewer to able to verify correctness at compile time. The comment I link to above is a good example, I could never test that code thoroughly enough to be satisfied, a proof of correctness is the only way. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] On Stack Safety
On Thu, Oct 24, 2013 at 4:18 PM, Benjamin Striegel ben.strie...@gmail.comwrote: you do compete with Go (4 kB initial stack segment) and Erlang (2.4 kB on 64 bit). Actually, goroutines have a default stack size of 8kb since 1.2. Also, applicable to this discussion, in 1.3 Go will be moving away from segmented stacks to contiguous growable stacks: https://docs.google.com/document/d/1wAaf1rYoM4S4gtnPh0zOlGzWtrZFQ5suE8qr2sD8uWQ/pub This is an interesting move, however the pointer-to-the-stack looks really hard to solve. In Rust, for example, I can store a reference to a stack element in a vec for example, and it is undistinguished (in the type system) from a pointer to an element not on the stack. Also, I was surprised at: When that call returns, the new stack chunk is freed., looks like they were not keeping the next chunk around. Indeed this could generate a lot of allocation traffic. -- Matthieu On Tue, Oct 22, 2013 at 12:52 AM, Patrick Walton pwal...@mozilla.comwrote: On 10/21/13 8:48 PM, Daniel Micay wrote: Segmented stacks result in extra code being added to every function, loss of memory locality, high overhead for calls into C and unpredictable performance hits due to segment thrashing. They do seem important for making the paradigm of one task per connection viable for servers, but it's hard to balance that with other needs. I'm not sure they're that important even for that use case. Is 4 kB (page size) per connection that bad? You won't compete with nginx's memory usage (2.5 MB for 10,000 connections, compared to 40 MB for the same with 4 kB stacks), but you do compete with Go (4 kB initial stack segment) and Erlang (2.4 kB on 64 bit). Besides, if we really wanted to go head-to-head with nginx we could introduce microthreads with very small stack limits (256 bytes or whatever) that just fail if you run off the end. Such a routine would be utterly miserable to program correctly but would be necessary if you want to compete with nginx in the task model anyhow :) Realistically, though, if you are writing an nginx killer you will want to use async I/O and avoid the task model, as even the overhead of context switching via userspace register save-and-restore is going to put you at a disadvantage. Given what I've seen of the nginx code you aren't going to beat it without counting every cycle. Patrick __**_ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/**listinfo/rust-devhttps://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Unified function/method call syntax and further simplification
It seems to me that maybe there are several concepts/changes that are discussed at once, and it would be possible to nitpick. Personally, when I think of unifying calls, I only think of having foo.bar(baz) being strictly equivalent to bar(foo, baz); nothing more than a syntax trick in a way. And thus: + I do not see any reason not to keep a special associated method look-up, though instead it would be tied to the first parameter of the function rather than limited to method-like calls + I do not see any reason not to keep automatically exporting/importing all methods whose first parameter is that of an exported/imported type or trait + I do not see any reason to move from explicit trait implementation to structural and automatic trait implementation (and I would consider it harmful) Thus I am wondering: - if I am missing something fundamental in the proposal by Gabor Lehel (I am not completely accustomed with the Rust terminology/idioms) - if such a simple syntax sugar could make its way into the language -- Matthieu On Sun, Oct 20, 2013 at 7:22 PM, Gábor Lehel illiss...@gmail.com wrote: On Sun, Oct 20, 2013 at 4:56 PM, Patrick Walton pwal...@mozilla.comwrote: I don't see the things you mention as warts. They're just consequences of, well, having methods in the OO sense. Nearly all of these warts show up in other object-oriented languages too. Maybe they're warts of object-oriented programming in general and illustrate that OO is a bad idea, but as I mentioned before Rust is designed to support OO. OO for me was always more tied in with virtual methods than with how methods are scoped. But either way - I think this is basically my view. :) The only part of it I like is dot syntax. -- Your ship was destroyed in a monadic eruption. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] c structs with bitfields
Actually, for bitfields the types into which the bits are packed are not left to the compiler. If you said int c : 4, then it will use a int. If you have: int a : 24; int b : 24; int c : 16; and int is 32 bits on your platform, then a will be 24 somewhere in 32 bits, same thing for b, and c will be 16 bits somewhere in 32 bits; for a single bit field cannot be split among several underlying integers. Exactly where the bits lie within the type, though, is part of the ABI. -- Matthieu On Sun, Sep 8, 2013 at 4:31 PM, Corey Richardson co...@octayn.net wrote: On Sun, Sep 8, 2013 at 3:00 AM, Martin DeMello martindeme...@gmail.comwrote: I was looking at the bindgen bug for incorrect bitfield handling https://github.com/crabtw/rust-bindgen/issues/8 but from a quick pass through the rust manual I can't figure out what the correct behaviour would be. What, for example, would the correct bindgen output for the following be: struct bit { int alpha : 12; int beta : 6; int gamma : 2; }; You'll have to check what the various C compilers do with bitfields. I imagine they pack the bitfields into the smallest integer type that will contain them all. But, almost everything about bitfields is entirely implementation defined, so it's probably going to be difficult to come up with what to do correctly in any portable way. Once you actually figure out what to generate, though, methods for getting/setting the bitfields would probably be best. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Mozilla using Go
In a practical manner, I would say that Go is production ready whilst Rust still has some way to go (!). Rust 1.0 is approaching, but is not there yet, and there are still syntax/semantic questions being examined and lots of work on the runtime... not to mention the lack of libraries (compared to Go) largely due to the language still not being finalized. I believe Rust could supplant Go (I see nothing in Go that Rust cannot do) and cast a much wider net, but first it has to mature. -- Matthieu On Sun, Sep 1, 2013 at 10:48 AM, John Mija jon...@proinbox.com wrote: Hi! I've seen that Mozilla has used Go to build Heka ( https://github.com/mozilla-**services/hekahttps://github.com/mozilla-services/heka). And although Go was meant to build servers while Rust was meant to build concurrent applications, Rust is better engineered that Go (much safer, more modular, optional GC). Then, when is better intended use case of Rust respect to Go? I expect Rust to be the next language for desktop applications if it gains as much maturity as Go but I'm unsure respect to the server side. __**_ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/**listinfo/rust-devhttps://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Slow rustc startup on Windows
Intriguing... I googled a bit to check what this was about and found: - pseudo-reloc.c, the part of mingw handling pseudo-relocations = http://www.oschina.net/code/explore/mingw-runtime-3.18-1/pseudo-reloc.c - the patch for pseudo-reloc v2 support = http://permalink.gmane.org/gmane.comp.gnu.mingw.announce/1953 At a glance I would say the problem is more on mingw side, however there might be something that can be done on rust side to mitigate or work-around the issue. -- Matthieu On Thu, Aug 29, 2013 at 9:35 PM, Vadim vadi...@gmail.com wrote: ... is apparantly caused by pseudo-relocations (#8859https://github.com/mozilla/rust/issues/8859). Does anybody here know anything about that? ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] cycle time, compile/test performance
Most C/C++ projects require parallel make because they lack modules. I work on medium-large projects in C++, for which we use Boost as well as about a hundred custom middleware components. A simple source file of ~1000 lines ends up generating a preprocessed file in the order of between 100,000 lines and 1,000,000 lines. Each and every TU. This is what makes them so amenable to parallelization. On the other hand, for languages with modules, a ~1000 lines file is a ~1000 lines file; it may depend on ~50 various other modules, but those need not be reparsed each time (a serialized version of the produced AST/ABT can be generated once and for all) and they can also be cached by the compiler (which unlike typical C compilers processes several modules in one pass). As such, there is much less gain here. On the other hand, I do agree with your command, it could possibly be better (temporarily) to run LLVM to cleanup each and every module before combining them into a single crate for optimization; however I feel that in the long term this will be useless once codegen itself is reviewed so that first and foremost a leaner IR is emitted that do not require so much cleanup to start with. -- Matthieu On Fri, Aug 23, 2013 at 10:16 PM, Bill Myers bill_my...@outlook.com wrote: - We essentially always do whole-program / link-time optimization in C++ terminology. That is, we run a whole crate through LLVM at once. Which _would_ be ok (I think!) if we weren't generating quite so much code. It is an AOT/runtime trade but one we consciously designed-in to the language. time: 33.939 s LLVM passes Maybe this should be changed to optionally do codegen and LLVM passes in parallel, producing an LLVM or native module for each Rust file, and then linking the modules together into the compiled crate. Alternatively, there seems to be some work on running LLVM FunctionPasses in parallel at http://llvm.1065342.n5.nabble.com/LLVM-Dev-Discussion-Function-based-parallel-LLVM-backend-code-generation-td59384.htmlbut it doesn't seem production-ready. Most large C/C++ projects rely on parallel make and distcc to have reasonable build times, and it seems something that Rust needs to support (either via make/distcc or internally) to be a viable replacement for large projects. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Dynamic in Rust
One question: Do you only want to retrieve the exact type that was passed in, or would you want to be able to extract an impl that matches the type actually contained ? The latter is more difficult to implement (dynamic_cast goes through hoops to check those things), but it is doable if sufficient information is encoded in the v-table. On Fri, Aug 23, 2013 at 5:04 PM, Oren Ben-Kiki o...@ben-kiki.org wrote: Yes, this would be similar to the `Typeable` type class in Haskell. It queries the vtable-equivalent, which contains stuff like the name of the type and allows doing `typeof(x)`, dynamic casts, etc. This is heavily magical (that is, depends on the hidden internal representation) and properly belongs in the standard platform and not in a user-level library. On Fri, Aug 23, 2013 at 4:40 PM, Niko Matsakis n...@alum.mit.edu wrote: Currently, this is not directly supported, though downcasting in general is something we have contemplated as a feature. It might be possible to create some kind of horrible hack based on objects. A trait like: trait Dynamic { } implT Dynamic for T { } would allow any value to be cast to an object. The type descriptor can then be extracted from the vtable of the object using some rather fragile unsafe code that will doubtless break when we change the vtable format. The real question is what you can do with the type descriptor; they are not canonicalized, after all. Still, it's ... very close. This is basically how dynamic downcasting would work, in any case. Niko On Fri, Aug 23, 2013 at 07:49:57AM +0300, Oren Ben-Kiki wrote: Is it possible to implement something like Haskell's Dynamic value holder in Rust? (This would be similar to supporting C++'s dynamic_cast). Basically, something like this: pub struct Dynamic { ... } impl Dynamic { pub fn put(value: ~T) { ... } pub fn get() - OptionT { ... } } I guess this would require unsafe code... even so, it seems to me that Rust pointers don't carry sufficient meta-data for the above to work. A possible workaround would be something like: pub struct Dynamic { type_name: ~str, ... } impl Dynamic { pub fn put(type_name: str, value: ~T) { Dynamic { type_name: type_name, ... } } pub fn get('a self, type_name: str) - Option'a T { assert_eq!(type_name, self.type_name); ... } } } And placing the burden on the caller to always use the type name int when putting or getting `int` values, etc. This would still require some sort of unsafe code to cast the `~T` pointer into something and back, while ensuring that the storage for the `T` (whatever its size is) is not released until the `Dynamic` itself is. (Why do I need such a monstrosity? Well, I need it to define a `Configuration` container, which holds key/value pairs where whoever sets a value knows its type, whoever gets the value should ask for the same type, and the configuration can hold values of any type - not from a predefined list of types). Is such a thing possible, and if so, how? Thanks, Oren Ben-Kiki ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Iterator blocks (yield)
Hello, I cannot comment on the difficulty of implementation, however I can only join Armin in wishing that if it ever takes off it would be better to make the declaration explicit rather than having to parse the definition of the function to suddenly realize that this is not a simple function but a full-blown generator. Furthermore, in keeping with the iterator ongoing, I would obviously push toward unifying the systems by having the generator implementing the Iterator trait (or whatever its name). -- Matthieu On Sun, Aug 11, 2013 at 12:01 PM, Armin Ronacher armin.ronac...@active-4.com wrote: Hi, On 10/08/2013 14:23, Michael Woerister wrote: Hi everyone, I'm writing a series of blog posts about a possible *yield statement* for Rust. I just published the article that warrants some discussion and I'd really like to hear what you all think about the things therein: http://michaelwoerister.**github.io/2013/08/10/iterator-** blocks-features.htmlhttp://michaelwoerister.github.io/2013/08/10/iterator-blocks-features.html I have been toying around with the idea of yield for a bit, but I think there are quite a few big problems that need figuring out. The way yield return works in C# is that it rewrites the code into a state machine behind the scenes. It essentially generates a helper class that encapsulates all the state. In Rust that's much harder to do due to the type system. Imagine you are doing a yield from a generic hash map. The code that does the rewriting would have to place the hash map itself on the helper struct that holds the state. Which means that the person writing the generator would have to put that into the return value. I currently have a really hard time thinking about how the c# trick would work :-( Aside from this some random notes from Python: - generators go in both directions in Python which caused problems until Python 3.3 where yield from (your yield ..) was introduced that expands into a monstrosity that forwards generators into both directions. - instead of using fn like def in Python I would prefer if it was an explicit yield fn that indicates that the function generates an iterator. The fact that Python reuses def is a source of lots of bugs and confusion. Regards, Armin __**_ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/**listinfo/rust-devhttps://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] RFC: Runtimeless libstd
Hi Corey, It's great to see that people are thinking more and more about integrating Rust in existing languages! I wonder however whether the other alternative has been envisioned: if Rust requires a runtime to work properly (specifically: TLS, task failure), would it be possible to provide an external caller the ability to setup the runtime before calling Rust methods ? I have absolutely no idea whether this is sensible or possible, but maybe rather than either extreme (a full runtime setup vs a no runtime mode) there is a way to meet in the middle; with a core runtime that can be set from a C interface (TLS ? ...) and then a set of cfgs for various additional pieces (such as garbage collection ? ...). -- Matthieu On Sun, Aug 11, 2013 at 7:42 PM, Corey Richardson co...@octayn.net wrote: I've opened a pull request for basic runtimeless support on libstd: https://github.com/mozilla/rust/pull/8454 I think it needs a wider discussion. I think it's very desirable to have a libstd that can be used without a runtime, especially once we have static linking and link-time DCE. As it stands, this patch is more of a hack. It removes swaths of libstd that currently can't work without a runtime, but adds some simple stub implementations of the free/malloc lang items that call into the libc, so really it requires a C runtime. What I think we should end up with is various levels of runtime. Some environments can provide unwinding, while others can't, for example. You can mix-and-match various cfgs for specific pieces of the runtime to get a libstd that can run on your platform. Other things require explicit language items (think zero.rs). Thankfully the compiler now errors when you use something that requires a language item you don't implement, so it's easy to see what you need and where. I envision a sort of platform file that implements language items for a specific platform, and you'd include this in the libstd build for the platform. But libstd, as it stands, is insanely dependant on a full, robust runtime, especially task failure and TLS. A runtimeless libstd can't depend on either of those. You can see the hack in str.rs to not use conditions when no_rt is given. While I don't think my PR should be merged as-is, I think the discussion for the best way to achieve what it accomplishes correctly is important. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] read_byte and sentinel values
Given that all values of u8 are meanginful, there is no space for an extra bit, so it is no surprise that it cannot be packed. For pointers, for example, it is typical to exploit the fact that the null pointer is a meaningless value and thus rely on this sentinel value to encode the absence of value, but in general this is only possible if such a sentinel value is possible to begin with. -- Matthieu On Wed, Jul 24, 2013 at 6:33 PM, Brendan Zabarauskas bjz...@yahoo.com.auwrote: On 25/07/2013, at 2:15 AM, Evan Martin mart...@danga.com wrote: Is an Optionu8 implemented as a pair of (type, value) or is it packed into a single word? A quick test shows: rusti std::sys::size_of::Optionu8() 16 ~Brendan ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] read_byte and sentinel values
It could be. If it is not, it may be that Option needs some love at the CodeGen level to make it so :) -- Matthieu. On Wed, Jul 24, 2013 at 6:46 PM, Corey Richardson co...@octayn.net wrote: On Wed, Jul 24, 2013 at 12:42 PM, Matthieu Monrocq matthieu.monr...@gmail.com wrote: Given that all values of u8 are meanginful, there is no space for an extra bit, so it is no surprise that it cannot be packed. For pointers, for example, it is typical to exploit the fact that the null pointer is a meaningless value and thus rely on this sentinel value to encode the absence of value, but in general this is only possible if such a sentinel value is possible to begin with. I would expect it to be backed into a u16. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Borrow lifetime assignment changed?
On Sat, Jul 6, 2013 at 5:26 PM, Tommy M. McGuire mcgu...@crsr.net wrote: On 07/03/2013 09:53 PM, Ashish Myles wrote: hello.rs:4:8: 4:33 error: borrowed value does not live long enough I was just about to write asking about this. I discovered it with the following code: for sorted_keys(dict).iter().advance |key| { ... } The result of sorted_keys is a temporary vector, which doesn't seem to live long enough for the iterator. If I give the temporary a name, everything works as expected. -- Tommy M. McGuire mcgu...@crsr.net ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev Interesting. There is a specific rule in the C++ specification to address temporaries: they should live up until the end of the full expression they are part of. I suppose that to suppose this case Rust might need the same rule and then determine that for { } is a single expression. It seems feasible (and maybe partly addressed already) however I cannot help but point out that I regularly see issues related to this popping on the Clang list and commits to fix it a bit more, apparently it's quite a nest of vipers and has ripple effects on implementing pretty much any other feature of the language. -- Matthieu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Segmented stacks (was: IsRustSlimYet (IsRustFastYet v2))
On Fri, Jul 5, 2013 at 11:07 PM, Daniel Micay danielmi...@gmail.com wrote: On Fri, Jul 5, 2013 at 4:58 PM, Bill Myers bill_my...@outlook.com wrote: I believe that instead of segmented stacks, the runtime should determine a tight upper bound for stack space for the a task's function, and only allocate a fixed stack of that size, falling back to a large C-sized stack if a bound cannot be determined. Such a bound can always be computed if there is no recursion, dynamic dispatch, dynamic allocation on the stack, or foreign C functions. In practice this means everything would use a large stack. It misses the use case of scaling up tasks to many I/O requests by trading off performance for small size. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev There was, at one point, a discussion on providing a #[reserve_stack(2048)] attribute for extern functions where the developer would indicate to the runtime that said function would never need more than N bytes of stack. It was deemed burdensome, and might be somewhat, however I still believe that annotating some key C extern functions (such as those performing IO) would allow computing this upper-bound in more cases. Of course, the real experiment would be to instrument the compiler and see exactly how many tasks can indeed be so bounded... and *why* the others cannot; unfortunately it might take some time to get it working. -- Matthieu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Tutorial translations
On Thu, Jul 4, 2013 at 2:26 AM, Graydon Hoare gray...@mozilla.com wrote: On 13-07-03 05:06 PM, Tim Chevalier wrote: I don't know of any such proposal already, so I encourage you to take the lead. Of course, even with the translations in the tree, there's the risk that they could become out of sync with the English version, but that's preferable to not having translations at all. (Perhaps other people who have been in projects with internationalized documentation can comment on the best approach(es) to this issue?) I was hoping we'd set up a pootle server to translate .po files, and/or use the existing pootle instance mozilla runs: https://localize.mozilla.org/ .po files aren't perfect, but they seem to be dominant in this space. There are a lot of tools to work with them, show the drift from a translation and its source, and reconstruct software and documentation artifacts from the result. I think po4a might be applicable to the .md files that hold our docs: http://po4a.alioth.debian.org/ Someone who is familiar with these tools and workflows would be very welcome here. We've had a few people ask and just haven't got around to handling it yet. -Graydon ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev I thought that .po files were mostly used to translate bits and pieces, such as strings used in GUIs, and not full-blown text files such as tutorials ? As to version drift, if both versions are in-tree it seems easy enough to check were made on the English version after the last commit of the Spanish version; you would just have to find the latest ancestor of both changesets and get all changes to the English version that are not in the Spanish branch. -- Matthieu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] IsRustSlimYet (IsRustFastYet v2)
On Thu, Jul 4, 2013 at 9:48 PM, Daniel Micay danielmi...@gmail.com wrote: On Thu, Jul 4, 2013 at 1:02 PM, Björn Steinbrink bstei...@gmail.com wrote: Hi, On 2013.07.05 02:02:59 +1000, Huon Wilson wrote: It looks like it's a lot more consistent than the original [IRFY], so it might actually be useful for identifying performance issues. (Speaking of performance issues, it takes extra::json ~1.8s to parse one of the 4 MB mem.json file; Python takes about 150ms; the `perf` output http://ix.io/6tV, a *lot* of time spent in allocations.) This is to a large part due to stack growth. A flamegraph that shows this can be found here: http://i.minus.com/1373041398/43t7zpBOcgy3CeDpkSht0w/inUqVLvZGEUfx.svg Setting RUST_MIN_STACK to 800 cuts runtime in half for me. Björn I find this is the case for many benchmarks. With segmented stacks, we're behind Java, and without them Rust can get close to C++. I think it should be part of the API in the task module, allowing segmented stacks to be used only when they make sense. The first task spawned by the scheduler can just have a large fixed stack. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev You are here assuming that one will not create many schedulers, which the current design allows. (Not necessarily a bad idea, per se, just wanted to point out a possible new limitation) -- Matthieu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Language support for external iterators (new for loop)
Hello, Regarding type-erasure without stack-allocation I remember a question on StackOverflow on how to implement this. In C++ this can be done using templates = template typename T, size_t Size, size_t Alignment class Pimpl; Pimpl will then declare raw storage (char [] in C++03, std::aligned_storageSize, Alignment::type in C++11), and then this space will be used by `T` (which was forward declared). An important (and maybe overlooked) aspect is that Size and Alignment are upper-bounds. Whilst to avoid wasting space it is better they be as close to the necessary value, equality is not necessary. And thus one could perfectly imagine a AgnosticIteratorRandomIterator, 16, 8 and it is up to the builder to create a type that fit... and maybe use dynamic allocation as a fallback if the iterator state cannot fit within the provided size. -- Matthieu. On Sun, Jun 30, 2013 at 4:22 PM, james ja...@mansionfamily.plus.com wrote: On 29/06/2013 22:32, Daniel Micay wrote: On Sat, Jun 29, 2013 at 5:29 PM, james ja...@mansionfamily.plus.com ja...@mansionfamily.plus.com wrote: On 29/06/2013 18:39, Niko Matsakis wrote: if you were going to store the result on the caller's stack frame, the caller would have to know how much space to allocate! Conceivably one If you can have a function that returns an allocated iterator, can't you instead have a function that informs how big it would be, and a function that uses a passed in pointer from alloca? We don't have alloca, but if we did, it would be less efficient than a statically sized allocation since it would involve an extra stack size check. A low-level, unsafe workaround like that isn't needed when you can just have a function return an iterator of a specific type. Well, if the caller knows the type of the returned object and it is returned by value - yes. But I thought the discussion had strayed to considering a case where the type is hidden inside the iterated-over object, so the caller using the pattern does not know how to allocate space for it and receive such an object by value. I was trying to suggest that it is not necessary for the caller to know much about the iterator object to avoid a heap allocation - it has to ask the size, and it has to then allocate and pass some raw storage on its stack. And I guess it has to ask for a function to call on the raw store when it has finished with it. I'm not claiming that this is more efficient that a return by value, just that it may be possible to avoid a heap allocation even in the case where the call site only sees an abstract iterator interface, and does not know any details. This is very much similar to tradeoffs in C++ between using inheritance and interfaces vs templates, and my experience has been that moving to templates has in some cases swapped a small runtime overhead for major problems with compilation speed and emitted code size - and the latter has caused runtime performance issues all of its own. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
[rust-dev] On tunable Garbage Collection
Hi, I was reading with interest the proposal on library-defined garbage collection (by use of dedicated types), and I have a couple of questions. My main worry, and thus question, is how to handle cross-scheme cycles ? There does not seem to be anything preventing me to have a RcObject reference a GcObject (and vice-versa), and I am wondering how garbage collectors are supposed to cooperate to realize that those may be dead cycles to collect. As such, I am wondering if despite being neat (and highly tunable) users might not be satisfied with a simpler scheme. It seems to me that lifetime already provide natural GC boundaries (they at least provide an upper-bound of the lifetime of the object) and thus that maybe it may be more natural to attach a gc to a lifetime (or set of lifetimes) rather than to a particular object ? I was thinking of something like: #pragma gc ReferenceCountingGC fn somefunction(s: String) - Int Note that in the latter case Rust would retain the @ sigil to denote garbage collected pointers but depending on where the object was allocated the @ would not refer to the same garbage collector. I have, obviously, no idea whether this would actually practical; and it might not! -- Matthieu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] The future of iterators in Rust
On Fri, Jun 7, 2013 at 7:05 AM, Daniel Micay danielmi...@gmail.com wrote: On Fri, Jun 7, 2013 at 12:58 AM, Sebastian Sylvan sebastian.syl...@gmail.com wrote: The linked article contrasts them with the GoF-style iterators as well. The Rust Iterator trait is similar to the one pass ranges (and possibly forward ranges), but not double-ended ranges or random-access ranges. It's the *family* of range-based iterators that makes it flexible (e.g. allowing you to write an efficient in-place reverse without knowing the underlying data structure, using a double-ended range). See fig. 3: http://www.informit.com/content/images/art_alexandrescu3_iterators/elementLinks/alexandrescu3_fig03.jpg The extent to which you can have mutable iterators in Rust is pretty small, because of the memory safety requirement. Iterators can't open up a hole allowing multiple mutable references to the same object to be obtained, so I don't think mutable bidirectional or random access iterators are possible. Forward iterators can't ever give you an alias because they're a single pass over the container. It's an easy guarantee to provide. Is it ? In this case it would mean that you can only have one Forward Iterator in scope (for a given container) at once too (ie, the forward iterator borrows the container); otherwise you could have two distinct iterators pointing to the same underlying element. I certainly appreciate the ongoing debate anyway, it's great to see things being exposed to light and openly discussed. I would like to contribute with one point: partitioning. Sometimes, you would like to partition a container, or point to one of its elements. For example, in C++, you have an overload of insert which takes an iterator allowing you to point to the routine where the element you ask to insert is likely to go and thus shaving off a couple comparisons (if you are right). This requires pointing to a single element, to be contrasted with a range. Another example would be partitioning, a partition of a slice can be represented with two points: the two end-points of the slice and the point of partition. Both of those examples can be represented by ranges (or in C++ iterator) though they do not themselves imply any iteration. My point, thus, is that there might be a need for fingers inside a container that go beyond basic iteration. -- Matthieu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Scheduler and I/O work items for the summer
On Sat, Jun 1, 2013 at 3:43 PM, Thad Guidry thadgui...@gmail.com wrote: I know that Rust currently doesn't currently support this, but what if futures could use a custom allocator? Then it could work like this: 1. Futures use a custom free-list allocator for performance. I don't see why Futures could not be allocated on the stack ? Since Rust is move aware and has value types, it seems to me this should be possible. -- Matthieu 2. The I/O request allocates new future object, registers uv event, then returns unique pointer to the future to its' caller. However I/O manager retains internal reference to the future, so that it can be resolved once I/O completes. 3. The future object also has a flag indicating that there's an outstanding I/O, so if caller drops the reference to it, it won't be returned to the free list until I/O completes. 4. When I/O is complete, the future get resolved and all attached continuations are run. Vadim Brian, Vadim described the idea fairly well there with the meat of my idea being # 2. I was just trying to describe the scenario that # 4 be able to happen only when all the registered event(s) happen (not just 1 blocking step but perhaps many blocking steps). I would not know where to start mocking something like that with Rust yet... still beginning. -- -Thad http://www.freebase.com/view/en/thad_guidry ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Do we have shared ports?
Hi, I am not quite sure whether you are asking for a multi-cast feature (all clients receive a copy of the message) or for a send-to-one-among feature (in which one of the available client would pick up the message). Could you elaborate ? -- Matthieu On Tue, May 28, 2013 at 11:45 AM, Alexander Stavonin a.stavo...@gmail.comwrote: Hi! As I know, As I know, we have SharedChan mechanism which can be used for many clients - server communications. But how can I send response from server to many clients? This is not commonly used case, and looks like there is no such mechanism or I've missed something? Best regards, Alexander. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Calling back into Rust from C code
As the error implies, the function type that you try to pass as a callback is incorrect. The problem is that because the callback is called from C it ought to be compatible with C, thus the extern bit. Rather than defining an anonymous function, you need to write an extern fn function (with a name), so that the function (at low-level) have a compatible ABI with C. -- Matthieu On Sat, May 11, 2013 at 3:04 AM, Skirmantas Kligys skirmantas.kli...@gmail.com wrote: I am trying to write a native wrapper for https://github.com/pascalj/rust-expat (BTW, if there is a native Rust XML parser, I am interested to hear about it, did not find it). I have trouble calling back into Rust from C code: fn set_element_handlers(parser: expat::XML_Parser, start_handler: fn(tag: str, attrs: [@str]), end_handler: fn(tag: str)) { let start_cb = |_user_data: *c_void, c_name: *c_char, _c_attrs: **c_char| { unsafe { let name = str::raw::from_c_str(c_name); start_handler(name, []); } }; let end_cb = |_user_data: *c_void, c_name: *c_char| { unsafe { let name = str::raw::from_c_str(c_name); end_handler(name); } }; expat::XML_SetElementHandler(parser, start_cb, end_cb); } This says that it saw fn... instead of expected extern fn for the second and third parameter. Any ideas how to do this? Thanks. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Having zip() fail when the two iterators are not the same length
On Mon, May 6, 2013 at 12:28 AM, Lindsey Kuper lind...@composition.alwrote: On Sun, May 5, 2013 at 6:17 PM, Andreas Rossberg rossb...@mpi-sws.org wrote: On May 5, 2013, at 23:54 , Lindsey Kuper lind...@composition.al wrote: On Sun, May 5, 2013 at 4:19 PM, Noam Yorav-Raphael noamr...@gmail.com wrote: I have a simple suggestion: the current implementation of zip() returns an iterator which stops whenever one of the two iterators it gets stop. I use zip() in python quite a bit. I always have a few lists, where the i'th value in each corresponds to the same thing. I use zip in python to iterate over a few of those lists in parallel. I think this is the usual use case. In this use case, when the two lists have a different length it means that I have a bug. it seems to me that Python's behavior, and current Rust behavior, is contrary to Errors should never pass silently from the zen of Python. What do you think of changing this, so that zip() will fail in such a case? Another iterator, say, zipcut can implement the current behavior if needed. For what it's worth, in Wikipedia's comparison of implementations of zip for various languages [0], none of them raise an error when the lists are different lengths; they all either stop with the shorter of the two lists, or fill in the missing values with a nil value. That may be coincidence, however, since the page lists only a handful of languages. As a counter example, OCaml, which calls it 'combine', throws. Standard ML even provides two variants, 'zip' and 'zipEq', the latter throwing. (And as an additional data point, nowhere in my SML code have I ever had a need for the non-throwing version.) Fair point. Perhaps Rust should also provide both. I like the SML names, too. Lindsey ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev In the name of preventing obvious mistakes I would strongly suggest implementing the reverse logic from SML: it's best when the shortest name provides the safe behavior and unsafe behaviors have more descriptive names indicating in what they are unsafe. It forces people to consciously choose the unsafe alternatives. I would therefore propose: - zip: only on collections of equal length - zipcut: stop iteration as soon as the shortest collection is exhausted - zipfill: fill the void (somehow: default value, OptionT, ...) This way we have all 3 variants, with descriptive names for those 2 who introduce specific behavior. -- Matthieu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Re : Re: RFC: User-implementable format specifiers w/ compile-time checks
On Sat, May 4, 2013 at 1:15 PM, Olivier Renaud o.ren...@gmx.fr wrote: Hi, 2013/5/3 Graydon Hoare gray...@mozilla.com (Erm, it might also be worthwhile to consider message catalogues and locale-facets at this point; the two are closely related. We do not have a library page on that topic yet, but ought to. Or include it in the lib-fmt page.) If you are talking about gettext-like functionality, usually this and format strings are thought of as independent processing layers: format strings are translated as such and then fed to the formatting function. This brings some ramifications, as the order of parameters in the translated template can change, so the format syntax has to support positional parameters. But this also allows to account for data-derived context such as numeral cases, without complicating the printf-like functions too much. There are other difficulties with localizing formatted messages that are never systematically solved, for example, accounting for gender. In all, it looks like an interesting area for library research, beyond the basic stick this value pretty-printed into a string problem. Cheers, Mikhail Gettext is indeed dependent on the fact that the format syntax allows positional parameters. I'd like to point out that gettext also makes use of a feature of the formatting function. Namely, the fact that it is not an error to call this function with more arguments than what the format string expects. In C, printf(%d, 1, 2) outputs 1. In Rust, fmt!(%d, 1, 2) is a compilation error. The use case for using this feature is briefly explained here http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms A simple example is that, given the string there are %d frogs, the translator may want to translate it to il n'y a aucune grenouille instead of il y a 0 grenouilles. In this case, the resulting function call would be printf(il n'y a aucune grenouille, 0), which is valid since the unused argument will be ignored. By the way, it occurs to me that fmt! requires a string literal as its first argument. How could a system like gettext, whose role is to substitute the format string at runtime, could work with fmt! ? ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev Maybe we are taking this a bit backward ? I understand that things like gettext, at the moment, only substitute the text; but that may be seen as mistake rather than a feature. Instead, we could perfectly imagine a gettext-like equivalent that takes both an original format string (to be translated) *and* its arguments and then will use fmt! under the hood to produce a fully translated string to be fed to the Writer instance. Note that anyway for a proper translation gettext requires access to certain elements; for example for correct plural forms to be picked (esp in Polish). With this out of the way, not only are positional arguments not required any longer, but we can also avoid ignoring a mismatch between the number of arguments supplied (and their types) and those expected by the original format-string. There is no point in being as loose as C. -- Matthieu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] No range integer type? Saftey beyond memory?
On Mon, Apr 29, 2013 at 6:33 PM, Jack Moffitt j...@metajack.im wrote: As was pointed out earlier with Mozilla source code, integer overflows do not happen. Probably because, in security-conscious code, you are supposed to validate your inputs for your actual expected range, and when you do, built-in overflow checks are just unnecessary overhead. If you're referring to Robert's comments, then I read them exactly the opposite way. He did mention that overflow to BigInts wasn't needed, but he is on the wants checked math side. I agree that this is a tradeoff, and that there is probably some performance loss at which it doesn't make sense. Until we have data on how expensive such a feature is, we can't make much progress in that particular debate. I just wanted to note my preference for having it default to on if it didn't cost too much, whatever cost too much might mean :) jack. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev It might be interesting to check how Clang integrated UBSan and its performance implications. I know there was some work using cold functions and expect hints to teach LLVM than the undefined branch (and callback) were to be very rare so that it could optimize the code taking them out of the hot path. You can check a blog article on usage of UBSan here [1], and rebound to the User's Manual from there; might be interesting to benchmark the code produced by Clang with/without integer overflow detection (and just that, UBSan include many other validations) to see what LLVM can do with it. [1]: http://blog.llvm.org/2013/04/testing-libc-with-fsanitizeundefined.html -- Matthieu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Division and modulo for signed numbers
I was thinking about the mapping of / and % and indeed maybe the simplest option is not to map them. Of course, having an infix syntax would make things easier: 5 % 3 vs 5 rem 3 vs 5.rem(3), in increasing order of typed keys (and visual noise for the latter ?). On the other hand, if there is no mapping I can imagine people keeping asking whether to use mod or rem... -- Matthieu On Thu, Apr 25, 2013 at 6:25 PM, Graydon Hoare gray...@mozilla.com wrote: On 13-04-25 07:52 AM, Diggory Hardy wrote: My opinion (that nobody will follow, but I still give it) is that integers should not have the / operator at all. This was one of the bad choices of C (or maybe of a previous language). Hmm, maybe, though I can imagine plenty of people being surprised at that. What really gets me though is that % is commonly called the mod operator and yet has nothing to do with modular arithmatic (I actually wrote a blog post about it a few months back: [1]). If it were my choice I'd either make x % y do real modular arithmatic (possibly even throwing if y is not positive) or have no % operator (just mod and rem keywords). While it's true that people often pronounce % as mod, the fact is most of the languages in the lineage we're looking at treat it as rem. http://en.wikipedia.org/wiki/**Modulo_operationhttp://en.wikipedia.org/wiki/Modulo_operation 50 languages in that list expose 'remainder' and 19 of them map it to '%'. As well, as a systems language, it _is_ salient that the instructions on the CPUs we're targeting and the code generator IR for said machines (LLVM) expose a remainder operation, not a modulo one. Of the 35 languages that expose _anything_ that does proper mod, only interpreted/script languages (TCL, Perl, Python, Ruby, Lua, Rexx, Pike and Dart) call it %. That's not our family. I'm sorry; if we're arguing over what the % symbol means, it means remainder in our language family (the one including C, C++, C#, D, Go, F#, Java, Scala). (more gruesome comparisons available here: http://rigaux.org/language-** study/syntax-across-languages/**Mthmt.html#MthmtcDBQAMhttp://rigaux.org/language-study/syntax-across-languages/Mthmt.html#MthmtcDBQAM) There are other questions to answer in this thread. We had a complex set of conversations yesterday on IRC concerning exposure of multiple named methods for the other variants -- ceiling, floor and truncating division, in particular. We may need to expose all 3, and it might be the case that calling any of them 'quot' is just misleading; it's not clear to me yet whether there's a consistent method _name_ to assign '/' to (floating point divide seems to do the opposite of integer divide on chips that have both). But I don't think it's wise to map % to 'mod' if we're exposing both 'mod' and 'rem'. That's a separate issue and one with (I think) a simpler answer for us. -Graydon __**_ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/**listinfo/rust-devhttps://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] LL(1) problems
On Thu, Apr 25, 2013 at 6:53 PM, Patrick Walton pwal...@mozilla.com wrote: On 4/25/13 9:23 AM, Felix S. Klock II wrote: On 25/04/2013 18:12, Graydon Hoare wrote: I've been relatively insistent on LL(1) since it is a nice intersection-of-inputs, practically guaranteed to parse under any framework we retarget it to. I'm a fan of this choice too, if only because the simplest efficient parser-generators and/or parser-composition methodologies I know of take an LL(1) grammar as input. However, Paul's earlier plea on this thread (Please don't do this [grammar factoring] to the official parser!) raised the following question in my mind: Are we allowing for the possibility of choosing the semi-middle ground of: There *exists* an LL(1) grammar for Rust that is derivable from the non-LL(1)-but-official grammar for Rust. ? Or do we want to go all the way to ensuring that our own grammar that we e.g. use for defining the syntactic classes of the macro system etc is strictly LL(1) (or perhaps LL(k) for some small but known fixed k)? I'm not sure we can do the latter. There are too many issues relating to `unsafe`, `loop`, the `self` argument, etc. to make the LL(1) derivable from the human-readable grammar in an automated fashion, in my eyes. At least, I'm pretty sure that if we did want to go down that route, we'd probably be doing months of parser research (and I do mean *research*, as far as I know). Patrick On the other hand, should you content yourself with LL(2), and actually have a tool like yapp2 guarantee that it is indeed LL(2) (and does not degenerate), would it not be sufficient ? (in case LL(1) really is gruesome compared to LL(2)) -- Matthieu __**_ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/**listinfo/rust-devhttps://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] About a protected visibility modifier
On Wed, Apr 17, 2013 at 9:24 AM, Eddy Cizeron eddycize...@gmail.com wrote: 2013/4/16 Brandon Mintern bran...@mintern.net I agree with you completely, Matthieu, but that's not the kind of thing I had in mind. Consider a LinkedList implementation. Its nodes and their pointers would be private to the LinkedList, but when implementing an Iterator for that list, you would need to use that private information to avoid n^2 performance. That's a typical case I had in mind. I am not sure. Whilst the head of the list will be private to the list, there is no reason that the *node type* be private. Expose the node, built iterators that take a reference to a node in their constructor, and have the list build the begin/end iterators (I guess). All iterations can be done safely... Of course, it does mean that you have an issue whenever you wish to erase an item by passing its position (unless you use unsafe code to make the reference mutable again). But that is, I think, a wrong example. Iterators are unsafe. You can easily keep dangling iterators aside and having them blow up in your hands. On the other, if we shunt external iterators and implement iteration with a foreach method accepting a predicate, then we do not need to expose the list internals. Give the predicate the ability to influence the list it is running on (returning an enum Stop/Remove/...) and you are on. I am not saying that there is absolutely no reason it will ever be needed, but I am challenging the needs exposed so far :) -- Matthieu And then I thought about it a little more and realized that this is precisely something that's unsafe. Most of my protected fields and methods in Java are accompanied by comments like, Important note: don't frobnicate a foo without also twiddling a bar. I think you're right, Daniel, that having a layer between public API and present implementation is probably not worth the cognitive overhead. I understand your point Brandon. But I could say sometime protected information is not so sensitive. When it is immutable for example. So why not declaring it public? Not to pollute the public interface with data unrelated with the common use (Yes I agree the argument is not very strong) It seems like it would be Rust best practice when implementing an interface to use the public API of the type to the extent possible, and when necessary for efficiency or other reasons, to use unsafe and fiddle with the private API. It could be. But if I'm allowed to play the Devil's advocate this implies that any implentation detail must be thought as potentially accessible (and then as necessarily understandable / documented) from outside (what you call private API). This is not the typical approach when considering private data. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] About a protected visibility modifier
I would also had that before jumping straight to direct access, one should carefully measure. With inlining and constant propagation I would not be surprised if the optimizer could not turn an access via the public API into as efficient code as a direct access to the field in the trait method implementation. And if you really need another access method, then maybe it should be added to the type directly ? On Tue, Apr 16, 2013 at 6:23 PM, Brandon Mintern bran...@mintern.netwrote: I was about to write how I can understand the use case. That often for efficiency (runtime, memory, or concision) reasons, it's helpful to program to an API other than the public one. That in the process of implementing an interface on some existing type, it often needs to be reworked a bit to make it more flexible and extensible. That not having protected might result in a lot of unsafe declarations sprinkled around. And then I thought about it a little more and realized that this is precisely something that's unsafe. Most of my protected fields and methods in Java are accompanied by comments like, Important note: don't frobnicate a foo without also twiddling a bar. I think you're right, Daniel, that having a layer between public API and present implementation is probably not worth the cognitive overhead. It seems like it would be Rust best practice when implementing an interface to use the public API of the type to the extent possible, and when necessary for efficiency or other reasons, to use unsafe and fiddle with the private API. On Tue, Apr 16, 2013 at 3:08 AM, Daniel Micay danielmi...@gmail.comwrote: On Tue, Apr 16, 2013 at 5:53 AM, Eddy Cizeron eddycize...@gmail.com wrote: Hi everyone I was thinking: wouldn't it be useful if rust also had a protected visibility modifier for struct fields with the following meaning: A protected field in a structure type T is accessible wherever a private one would be as well as in any implementation of a trait for type T. Just an idea. -- Eddy Cizeron What use case do you have in mind for using a protected field instead of a public one? The use case for a private field is separating implementation details from the external API and upholding invariants. Although it's *possible* to safely access them in an external module by using an unsafe block, if you took into account all of the implementation details and invariants of the type. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Library Safety
Well, a full effect system might not be necessary just for safe plugins. Since we have a way in Rust to indicate which version a plugin we want to link to, we could apply some restrictions there. For example, specifying that only a certain list of other libraries can be used by this plugin (and typically none with IO features) and that it cannot use unsafe code would guarantee some sandboxing already. Similarly, the GC restriction could be placed there. -- Matthieu On Wed, Apr 3, 2013 at 5:26 PM, Dean Thompson deansherthomp...@gmail.comwrote: The Rust team refers to this as an effect system. They originally had one, but that one proved unworkable and was deleted. They continue to regard it as desirable but difficult to get right, and as a potential future. Here's some historyhttp://irclog.gr/#search/irc.mozilla.org/rust/%22effect%20system%22. They would certainly welcome serious proposals or demos, although almost certainly continuing to hold it out for post-1.0. They would think in terms of first researching the most successful effect systems in other languages. Dean From: Grant Husbands rust-...@grant.x43.net Date: Wednesday, April 3, 2013 5:14 AM To: rust-dev@mozilla.org Subject: [rust-dev] Library Safety I've been following the Rust story with some interest and I'm excited about the opportunities Rust brings for sandbox-free, secure system software. However, there are some things that it lacks, that would otherwise make it the obvious choice. One that I feel is important that has been touched upon by others is having static assurances about code, especially imported libraries. If I use a jpg library, I want to be sure that it isn't going to do be able to do any unsafe operations, use GC or access the file-system or the network. That way, I don't have to trust the code and can instead be assured that it simply cannot perform any dangerous actions. Currently, to do that, I have to inspect the whole library. As a developer without the time to do that, I'd much prefer for the import to be annotated to indicate such things (or, ideally, to be annotated to indicate the allowed dangers). This could be seen, of course, as a precursor to capabilities - reducing ambient authority is a key first step in getting a capability-secure system - but it's also a simple way of getting assurances about code without having to inspect it. Does it seem like a reasonable thing to add? I may be able to find time to work on it, should it be acceptable. Regards, Grant Husbands. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] RFC: syntax of multiple trait bounds on a type parameter
On Sat, Jan 12, 2013 at 3:21 AM, James Gao gaoz...@gmail.com wrote: and how about these two case: a) fn fooT1: Ord, Eq, Hash; T2: Ord, ::Eq (...) {...} b) fn fooT1: Ord + Eq + Hash, T2: Ord + ::Eq (...) {...} Really likes b), + looks especially suiting since we are adding up requirements. -- Matthieu On Sat, Jan 12, 2013 at 6:27 AM, Gareth Smith garethdanielsm...@gmail.com wrote: On 11/01/13 18:50, Niko Matsakis wrote: fn fooT:Eq(..) {...} // One bound is the same fn fooT:(Ord, Eq, Hash)(...) {...} // Multiple bounds require parentheses How about using { ... } rather than ( ... ), like imports: use xxx::{a, b, c}; fn fooT:{Ord, Eq, Hash}(...) { ... } I don't know that this is better but maybe it is worth considering? Gareth. __**_ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/**listinfo/rust-devhttps://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Is necessary the manual memory management?
On Sun, Oct 28, 2012 at 8:48 PM, Niko Matsakis n...@alum.mit.edu wrote: Regardless of whether manual memory management is desirable as an end goal, support for it is essentially required if you wish to permit tasks to exchange ownership of data without copies. For example, in Servo we have a double-buffered system where mutable memory buffers are exchanged between the various parts of the system. In order to make this safe, we have to guarantee that this buffer is unaliased at the point where it is sent—if you know it's unaliased, of course, you also know that you could safely free it. As a broader point, it turns out there are a LOT of type system things you can do if you know something about aliasing (or the lack thereof). Our current approach to freezing data structures for example is reliant on this. Safe array splitting for data parallelism—if we ever go in that direction—will be reliant on this. And so forth. So, supporting a unique-pointer-like construct makes a lot of sense. Niko I really think this is the core point: unique/borrowed/shared are less about memory management than about ownership semantics. It would be perfectly viable (albeit slower) to treat them all identically in terms of codegen (ie, GC'em all). On the other hand, ownership semantics provide both the developer and the compiler with *guarantees* upon which they can build. -- Matthieu John Mija jon...@proinbox.com October 28, 2012 4:55 AM Does make sense to have a language with manual memory management since it's possible to create stuff of low level with a specialized garbage collector? It's good to create drivers, but it's already C. i.e. Both Native Oberon and Blue Bottle operating systems were implemented in languages, Oberon and Active Oberon, which have type safety and concurrent (at least in compiler ulm oberon) GC support. http://www.inf.ethz.ch/personal/wirth/books/ProjectOberon.pdf Then, Java is being used in embedded and real-time systems with a deterministic garbage collection: http://www.pr.com/press-release/226895 ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev compose-unknown-contact.jpg___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Object orientation without polymorphism
On Tue, Oct 23, 2012 at 2:46 PM, Julien Blanc wh...@tgcm.eu wrote: Lucian Branescu a écrit : Something like this http://pcwalton.github.com/blog/2012/08/08/a-gentle-introduction-to-traits-in-rust/ Very nice introduction. The only question that arises for me (coming from c++ ground and comparing this to c++ templates) is why trait implementation is made explicit ? Is it a design decision or a current compiler limitation ? I guess the compiler could not too difficultly be made smart enough to determine from its actual interface if a type conforms to a trait. Code generation may be more a problem, though… It is actually a design decision, quite similar to how typeclass in Haskell require explicit instantiation whereas Go's interfaces, like C++ templates, do not. Automatic detection is also called duck typing: it if quacks like a duck, then it's a duck. There are two main disadvantages: - functionally, it means that you can use an object for something it has never really been meant for = just because the signatures of some functions match does not mean that their semantics match too - in terms of codegen, this might imply bloat (C++) or runtime overhead (Go) On the other hand, Haskell's approach is quite practical as long as one solves the coherence issue. -- Matthieu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] condition handling
On Sat, Oct 20, 2012 at 11:16 AM, James Boyden j...@jboy.me wrote: On Sat, Oct 20, 2012 at 10:48 AM, Graydon Hoare gray...@mozilla.com wrote: Some references to the lurking plan here: https://mail.mozilla.org/pipermail/rust-dev/2011-November/000999.html Firstly, I'd like to express my appreciation for the clear reasoning in this linked post. I found the arguments clear and compelling, matching my own experience -- especially the enumeration of the small set of realistic uses of exception handling (ignore, retry, hard-fail, log, or try one of a small number of alternatives to achieve the desired result). - Condition.raise is a normal function and does something very simple: - look in TLS to see if there's a handler - if so, call it and return the result to the raiser - if not, fail - This means condition-handling happens _at site_ of raising. If the handler returns a useful value, processing continues as if nothing went wrong. It's _just_ a rendezvous mechanism for an outer frame to dynamically provide a handler closure to an inner frame where a condition occurs. This all seems reasonable after reading that post. So: API survey. Modulo a few residual bugs on consts and function regions (currently hacked around in the implementation), I have 3 different APIs that all seem to work -- which I've given different names for discussion sake -- and I'm trying to decide between them. They mostly differ in the number of pieces and the location and shape of the boilerplate a user of the condition system has to write. My current preference is #3 but I'd like a show of hands on others' preferences. snip Opinions? Clarifying questions? I prefer option #3. I like that it states the condition and handler up-front (in contrast to most exception-handling syntaxes, that leave you guessing what might go wrong until you reach the catch statements at the end of the try/catch block). I like that the error-handling code is implicitly not an afterthought. (I find that when I'm writing the code on the main path, there's a strong temptation to just keep on coding, and come back to insert the error-handling code later.) I like that it has the minimum of boilerplate code (in contrast to option #1 especially). In addition to the boilerplate, I don't really like option #1 because of the separation of the protected block from the handler code. My concern with option #2 is that, despite my general fondness for RAII, the '_g' variable isn't explicitly used anywhere, so creating a named variable seems redundant. Plus, having an object that can reach out of its variable and affect all the code that follows in its block scope is too magical for my liking. jb ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev This seems heavily influenced by Lisp's Conditions Restarts[1] (not that I have used Lisp before though, but I *am* interesting in error handling strategies), however it seems relatively unclear to me how the syntax work. If I understood correctly there are 3 steps: - declare the condition - setup the handler for that condition (with the poll going on) - raise a signal for that condition At the moment you only show the first 2, and it's unclear to me exactly how the handler is *used* at the call site (looks to me like a regular function call passing an instance of T as argument and getting a U in exchange). I suppose it would something like: let u = core::condition::signal(OutOfKitten, t) One of the point I find difficult about this (and which is as difficult with exceptions, short of going the path of the damned with exceptions specification) is that it might become difficult to know exactly which handlers to setup; as such this: - Condition.raise is a normal function and does something very simple: - look in TLS to see if there's a handler - if so, call it and return the result to the raiser - if not, fail might be a little forceful. As such, I would have a tiny request: = Would it make sense to just make it a point of customization but have a way to specify a default result in case there is no handler, or even specify a default handler ? This means 2 or 3 different flavors of raising a condition, which adds to the complexity of the language though. On the other hand, using the syntax I used above, it's just: let u = core::condition::signal(OutOfKitten, t) // default behavior, ie fail if no handler let u = core::condition::signal(OutOfKitten, t, |t| { fail }) // default behavior made explicit let u = core::condition::signal(OutOfKitten, t, |t| { if t == 0 then fail else 3 }) // custom handler, if none setup let u = core::condition::signal(OutOfKitten, t, 4) // simple way to pass a default return value without actually having to write up a lambda, ala |t| { 4 } (just to avoid
Re: [rust-dev] condition handling
On Sat, Oct 20, 2012 at 1:37 PM, Gareth Smith garethdanielsm...@gmail.comwrote: Option 3 looks prettiest to me, but I like that in Option 1 the error-raising code comes before the error-handling code (based on experience with other languages). This might not be an issue in practice. I am not sure how I like Option 3 with a more complex trap block: OutOfKittens.trap(|t| { OrderFailure.trap(|t| notify_support()).in { order_more_kittens(); } UseAardvarksInstead }).in { do_some_stuff(); that_might_raise(); out_of_kittens(); } Compare this to some ideal syntax: protect { do_some_stuff(); that_might_raise(); out_of_kittens(); } handle OutOfKittens(t) { protect { order_more_kittens(); } handle OrderFailure(t) { notify_support(); } UseAardvarksInstead ) Gareth __**_ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/**listinfo/rust-devhttps://mail.mozilla.org/listinfo/rust-dev I just realized I had missed a point, which is somewhat similar to what Gareth raised: composability. Let us start with an example in a C++ like language: UserPreferences loadUserPreferences(std::string const username) { try { return loadUserPreferencesFromJson(username); } catch(FileNotFound const) { if (username == toto) { throw; } return loadUserPreferencesFromXml(username); // not so long ago we used xml } } The one thing here is throw;, which rethrows the current exception and pass it up the handler chain. Is there any plan to have this available in this Condition/Signal scheme ? Ie, being able in a condition to defer the decision to the previous handler that was setup for the very same condition ? It could be as simple as acore::condition::signal(OutOfKittens, t) from within the current handler block. Which basically means that during its invocation the current handler is temporarily popped from the stack of handlers (for that condition) and after its executes (if it does not fail) is pushed back. I even wonder if this could not become automatic, that is unless a handler fails hard or returns a proper value, its predecessor is automatically called. However I failed to see how this could be nicely integrated in the syntax and wonder what the ratio of pass-up-the-chain vs ignore-predecessors would be in practice. -- Matthieu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Polymorphism default parameters in rust
On Sat, Aug 4, 2012 at 6:53 PM, Patrick Walton pwal...@mozilla.com wrote: On 08/02/2012 12:51 PM, Emmanuel Surleau wrote: Hi, I'm new to rust, and I'm struggling to find an elegant way to work with default parameters. Generally we've been experimenting with method chaining to achieve things like default and named parameters in Rust. See the task builder API for an example: https://github.com/mozilla/**rust/blob/incoming/src/**libcore/task.rs#L197https://github.com/mozilla/rust/blob/incoming/src/libcore/task.rs#L197 So I can see your use case being something like: let flag = Flag(verbose, Maximum verbosity).short_name(v); To implement this you'd write: struct Flag { name: str; desc: str; short_name: optionstr; max_count: uint; banner: optionstr; } // Constructor fn Flag(name: str, desc: str) - Flag { Flag { name: name, desc: desc, short_name: none, max_count: 1, banner: none } } impl Flag { fn short_name(self, short_name: str) - Flag { Flag { short_name: some(short_name) with self } } fn max_count(self, max_count: uint) - Flag { Flag { max_count: max_count with self } } fn banner(self, banner: str) - Flag { Flag { banner: some(banner) with self } } } (Note that this depends on the functional record update with syntax working for structs, which it doesn't yet.) If this style catches on it'd probably be nice to have a macro to generate the mutators (fn short_name, fn max_count, fn banner). Then instead of the impl { ... } above you'd write something like: make_setter!(Flag.short_name: optionstr, WrapOption); make_setter!(Flag.max_count: uint); make_setter!(Flag.banner: optionstr, WrapOption); (Assuming that WrapOption is a special flag to the macro to indicate that the value should automatically be wrapped in some). How does this sound? Patrick Verbose. -- Matthieu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] replacing bind with lightweight closure syntax
On Sat, Jun 2, 2012 at 9:12 PM, Niko Matsakis n...@alum.mit.edu wrote: Hello Rusters, I want to remove bind. It's a pain to maintain and a significant complication in the language semantics (both problems caused by having two kinds of closures with subtle distinctions). Nobody really objects to this, but the fact remains that bind serves a role that is otherwise unfilled: it provides a (relatively) lightweight closure syntax. For example, I can write: foo.map(bind some_func(_, 3)) which is significantly less noisy than: foo.map({|x| some_func(x, 3)}) I previously tried to address this through the introduction of `_` expressions as a shorthand for closures. This proposal was eventually rejected because the scope of the closure was unclear. I have an alternative I've been thinking about lately. The basic idea is to introduce a new closure expression which looks like: `|| expr`. The expression `expr` may make use of `_` to indicate anonymous arguments. The scope of the closure is basically greedy. So `|| _ + _ * _` is unproblematic. The double bars `||` should signal that this is a closure. Therefore, you could write the above example: foo.map(|| some_func(_, 3)) This also makes for a nice, lightweight thunk syntax. So, a method like: map.get_or_insert(key, || compute_initial_value(...)) which would (presumably) return `key` if it is present in the map, but otherwise execute the thunk and insert the returned value. The same convention naturally extends to named parameters, for those cases where anonymous parameters do not work. For example, if you wish to reference the parameter more than once, as in this example, which pairs each item in a vector with itself ([A] = [(A,A)]): foo.map(|x| (x, x)) In fact, we *could* do away with underscores altogether, although I find them more readable (they highlight those portions of the function call that come from arguments vs the environment). The immediate motivation for this is that I am having troubles with some code due to complications introduced by bind. I'd rather not fix those bugs. I'd rather just delete bind. NIko __**_ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/**listinfo/rust-devhttps://mail.mozilla.org/listinfo/rust-dev Hello, I must admit I find the latter example quite pleasing: foo.map(|x| (x, x)) and (unlike you it seems) finds that it could completely replace _. The problem with _ is that though it seems nice enough if there is one, but it is not as immediate for the reader to determine how many parameters the resulting function has. We could number them (_0, _1, _2) but it makes maintenance painful (if you remove _1, you have to remain all those which followed). I find that explicitly naming the arguments in between the pipes really help making it clear how many arguments the created function has. Of course, it still does nothing with the problem of shadowing an outer element meant to be captured. Not sure if there is a way to deal with that at the same time... -- Matthieu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Back to errors, failures and exceptions
On Fri, May 25, 2012 at 7:16 PM, David Rajchenbach-Teller dtel...@mozilla.com wrote: On Fri May 25 18:01:25 2012, Patrick Walton wrote: On 05/25/2012 08:43 AM, Kevin Cantu wrote: This conversation reminds me of Alexandrescu's talk about D's scope keyword: http://channel9.msdn.com/Events/Lang-NEXT/Lang-NEXT-2012/Three-Unlikely-Successful-Features-of-D It looks like a graceful syntax for handling all sorts of nested error and failure cleanup... I like the scope keyword, FWIW. It'd be even better if you didn't have to provide a variable name if all you want to do is execute some code at the end of the block. This would provide a facility like Go's defer keyword, but more general since it also admits C++ RAII patterns. Patrick What's the difference between |scope| and Rust's resources, exactly? Cheers, David -- David Rajchenbach-Teller, PhD Performance Team, Mozilla Regarding adding logs to the errors: - Boost.Exception has something similar: you can add class instances to the exception using the error info mechanism [1] - It also reminds me of what happens in case of failures using the note expressions, I believe the same notes could be reused in case of exceptions/errors to provide additional logs. Of course, there is a difference between the two schemes. Boost's is somewhat more powerful because it does not consist of adding simple strings but full-blown objects (which could be conditionned to be printable), and thus allow inspection of structured data at the error-handling site. It may be thought of as overkill too... Regarding D's scope keyword [2] There are several statements based on it: - scope(exit) where is executed on exit, no matter what - scope(failure) where is executed on exit if the previous statement failed - scope(success) where is executed on exit if the previous statement succeeded On the other hand, it kind of look like a hack, maybe it is an issue of getting used to it though. [1]: http://www.boost.org/doc/libs/1_49_0/libs/exception/doc/error_info.html [2]: http://dlang.org/exception-safe.html ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Back to errors, failures and exceptions
On Wed, May 23, 2012 at 2:47 PM, David Rajchenbach-Teller dtel...@mozilla.com wrote: Actually, one of the conclusions of our previous discussion is that Java/C++/ML-style exceptions are probably not what we want for Rust. I seem to remember that we also concluded that using failures as exceptions was probably not the right course. Hence this new thread :) Let me put together what I believe are a few desirable qualities of an issue management system. For the moment, let's not wonder whether that system is a language feature, a library or a coding guideline. As a whole, this looks very good to me, I just have one quick question: * The system _must_ not prevent developers from calling C code from Rust. * The system _must_ not prevent developers from passing a pointer to a Rust function to C code that will call back to it. * The system _must_ not prevent, some day, developers from calling Rust from JavaScript. * The system _must_ not prevent, some day, developers from calling JavaScript from Rust. * Issues _must_ not be restricted to integers (or to one single type). Could you explain what you mean by this ? I suppose this is a direct jab at the horror that is errno and more in the direction of being able to throw anything (possibly at the condition it implements a given interface) ? * The default behavior in case of untreated issue _should_ be to gracefully kill the task or the application. * Whenever an untreated issue kills a task/application, it _should_ produces a report usable by the developer for fixing the issue. * It _should_ be possible to deactivate that killing behavior. There _may_ be limitations. * It _should_ be possible to deactivate that killing behavior conditionally (i.e. only for some errors). * The system _should_ eventually have a low runtime cost – in particular, the case in which no killing happens should be very fast. Do we agree on this base? Cheers, David On 5/22/12 4:56 AM, Bennie Kloosteman wrote: Are exceptions a good model for systems programming ? - legacy C programs cant call you without a wrapper which translates all possible exceptions - unwinding a stack is probably not a good idea in a kernel or when you transition into protected/user mode.( I know of very few kernels that use exceptions ). - Its not just legacy , Winrt uses C++ classes but returns error codes tor low level APIs. However its very nice for user programs . These days these different worlds works quite well , c libs which Is mainly used for system programming don't use them and C++ apps are more user programs and they do , C++ calls C , C rarely calls C++. Obviously if you write a kernel or shared library you cannot use exceptions if c programs call your code and there is a lot of c out there While not really an issue for the language ( just dont use exceptions) it means a standard lib that throws an exception would be a pain for such work and you would need a different standard lib , which is an issue . BTW could Rust use tasks as a substitute for exception scopes ? Tasks have error bubbling , hiding , stack unwinding , throw ( fail) and should have integrated logging . You could put a sugar syntax around it but it would still work when being called by c. Also with tasks you can cancel or do timeouts giving asynronous exceptions which are really needed ( eg in most systems cancel a long running task is very anoying very long pause). and which most trivial exception implementations don't do ..Not sure if this is the right way but there seems a lot of overlap and it would work with C and systems programming,. Ben ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev -- David Rajchenbach-Teller, PhD Performance Team, Mozilla ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Interesting paper on RC vs GC
On Tue, May 1, 2012 at 5:51 PM, Sebastian Sylvan sebastian.syl...@gmail.com wrote: On Tue, May 1, 2012 at 4:07 AM, Matthieu Monrocq matthieu.monr...@gmail.com wrote: As a consequence, I am unsure of the impact this article should have on Rust's GC design. The implementation strategies presented are very clear and the advantages/drawbacks clearly outlined, which is great (big thank you to the OP); however the benchmarks and conclusions might be a tad Java-centric and not really apply to Rust's more advanced type system. My conjecture is that Java is *especially* unsuitable for RC for the following reasons: * lots of references, thus lots of reference traffic, thus lots of ref count inc/dec. * lots of garbage, thus more expensive for an algorithm for which the cost is proportional to the amount of garbage (RC) rather than to the amount of heap (GC). So I'd expect vanilla RC to do better in comparison to GC (though perhaps not beat it) in Rust than in Java. Applying the optimizations mentioned in the article (most of which rely on using deferred ref counting, which does mean you give up on predictable timing for deallocations) may make RC significantly better in Rust. Seb I agree that the technics outlined, especially with the details on their advantages/drawbacks are a very interesting read. As for the predictable timing, anyway it seems hard to have something predictable when you take cycle of references into account: I do not know any inexpensive algorithm to realize that by removing a link you are suddenly creating a self-sustaining group of objects that should be collected. Therefore I would venture that anyway such groups would be collected in a deferred fashion (using some tracing algorithm). -- Matthieu --- Finally I think it might be worth considering having two distinct GC strategies: - one for immutable objects (that only references other immutable objects) - one for the rest (mutable objects with potential cycles) I see no reason to try and employ the same strategy for such widely different profiles other than the potential saving in term of coding effort. But then trying to cram every single usecase in a generic algorithm while keeping it efficient seems quite difficult too, whereas having several specialized mechanisms might make for much clearer code. One idea I have toyed with for my own was to have simple stubs: design a clear API for GC, with two (or more) sets of functions for example here, and call those functions instead of inlining their effect (in the IR). By providing the functions definitions externally (but inlining them in each IR module) this makes it easy to switch back and forth between various implementations whilst still retaining the efficiency of the LLVM backend to inline/optimize the calls. This means one can actually *test* the strategies, and perhaps even let the user *choose* which one better suits her needs. Of course coherency at the executable level might be necessary. -- Matthieu -- Sebastian Sylvan ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] In favor of types of unknown size
Hi Sebastian, I have a few comments. On Sun, Apr 29, 2012 at 12:21 AM, Sebastian Sylvan sebastian.syl...@gmail.com wrote: On Fri, Apr 27, 2012 at 3:15 PM, Niko Matsakis n...@alum.mit.edu wrote: The types `[]T` and `str` would represent vectors and strings, respectively. These types have the C representation `rust_vecT` and `rust_vecchar`. They are of *dynamic size*, meaning that their size depends on their length. The literal form for vectors and strings are `[a, b, c]` and `foo`, just as normal. Back when I was entertaining the idea of writing my own rust-like language (before I was aware of rust's existence), I had the idea that all records/objects cold have dynamic size if any of their members had dynamic size (and the root cause of dynamic size would be fixed-size arrays - fixed at the time of construction, not a static constant size). This is only slightly related, but it's too close that I can't resist presenting gist of it (it's not completely worked out), in case anyone else wants to figure it out and see if it makes sense :-) Basically the idea spawned from the attempt of trying to avoid pointers as much as possible. Keep things packed, with chunky objects, reduce the complexity for GC/RC, reduce memory fragmentation, etc.. Aside from actual honest-to-goodness graphs (which fairly rare, and most are small, and unavoidable anyway). The conjecture is that the main source of pointers are arrays. Okay, so basically the idea is that arrays are length-prefixed blocks of elements. They're statically sized (can't be expanded), but you can pass in a dynamic, non-constant value when you construct them. Unlike C/C++ though these arrays can still live *inside* an object. There's some fiddlyness here.. e.g.. do you put all arrays (except ones which true const sizes?) at the end of the object so other members have a constant offset? If you have more than a small number of arrays in an object it probably makes to have a few pointers indicating the start of each instead of having to add up the sizes of preceeding arrays each time an access is made to one of the later arrays. Small reactions on pointers: I think it's a good idea to pack the variable length structures at the end of the current object. However I would use cumulative offsets rather than pointers, because of size (on 64-bits architecture, which are becoming the de-facto standard for PCs and servers). The idea would be in C-style: struct Object { int scalar1; int scalar2; unsigned __offset0; unsigned __offset1; unsigned __offset2; SomeObject __obj0; Table __obj1[X]; }; Where __offset0 indicates the offset from the start of Object to the start of __obj0, __offset1 the offset from the start of Object to the start of __obj1 and __offset2 the offset from the start of Object to the start of __obj2. This means you have direct access to any attribute with a simple addition to pointer, and you can know the size with a simple substraction (the size of __obj0 is __offset1 - __offset2). So, during Construction of an object, you'd have to proceed in two phases. First is the constructor logic where you compute values, and the second is the allocation of the object and moving the values into it. You need to hold off on allocation because you don't know the size of any member objects until you've constructed them. Moving an array is now expensive, since it requires a copy, not just a pointer move. So ideally the compiler would try to move the allocation to happen as early as possible so most of the values can be written directly to its final location instead of having to be constructed on the stack (or heap) and then moved. There are of course cases where this couldn't be done. E.g. if the size of an array X, depends on some computation done on array Y in the same object - you have to create Y on the stack, or heap, to run the computation before you can know the total size of the object, and only then can you allocate the final object and copy the arrays into it. Yes, this is getting quite difficult at this stage. It's good once the size is settled but the construction can be expensive. I'm not 100% sold on the idea, since it does make things a bit more complex, but it is pretty appealing to me that you can allocate dynamic-but-fixed sized arrays on the stack, inside other objects etc.. For a language that emphasizes immutable data structures I'd imagine the opportunity to use these fixed arrays in-place would be extremely frequent. Seb -- Sebastian Sylvan There is a subtle issue that I had not remarked earlier. This mechanism works great for fixed-size arrays, but is not amenable to extensible arrays: vectors and strings *grow*. So it would work if the field/attribute is runtime-fixed-size, either because the type imposes it or because it's declared immutable, however it will not work in the general case. This is important because it means that in
Re: [rust-dev] In favor of types of unknown size
On Sat, Apr 28, 2012 at 8:12 AM, Marijn Haverbeke mari...@gmail.com wrote: I must say I prefer Graydon's syntax. `[]T` sets off all kinds of alarms in my head. I have no strong opinion on dynamically-sized types. Not having them is definitely a win in terms of compiler complexity, but yes, some of the things that they make possible are nice to have. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev Hello Niko, First I really appreciate you thinking hard about it and if you don't want to bother the list I would certainly not mind talking it out with you in private; I feel it's very important for these things to be thought through extensively and I really like that decisions in Rust are always considered carefully and objectively. That being said, I have two remarks: I would like to ask a question on the vectors syntax: why the focus on [] ? I understand it in the literal form, however a string type is denoted as `str` so why not denote a vector of Ts as `vecT` ? Yes, it's slightly more verbose, but this is how all the other generic types will be expressed anyway. Similarly, since a substring is expressed as `substr`, one could simply express a slice as `sliceT` or `svecT` or even `array_refT`. I don't think being overly clever with the syntax type will really help the users. Imagine grepping for all uses of the slice type in a crate ? It's so much simpler with an alphabetic name. (Also, `[:]/r T` feel *really* weird, look at the mess C is with its pointer to function syntax that let's you specify the name in the *middle* of the type...) As for types of unknown sizes, I would like to point out that prevent users from having plain `str` attributes in their records is kinda weird. The pointer syntax is not only more verbose, it also means that suddenly getting a local *copy* of the string gets more difficult. Sure it's equivalent (semantically) to a unique pointer `~str`, but it does not make copying easier, while it's one of the primary operations in impure languages (because the original may easily get modified at a later point in time). I think that `rust_vecT` having an unknown size rather than being (in effect) a pointer to a heap allocated structure is nice from an implementation point of view, but it should not get in the way of using it. I would therefore venture that either it has an unknown size and the compiler just extend this unknown size property to all types so they can have `vecT` and `str` attributes naturally, or it's better for it *not* to have an unknown size. I would also like to point out that if it's an implementation detail, the actual representation might vary from known size to unknown size without impact for the user, so starting without for the moment because it's easier and refining it later is an option. Another option is to have a fixed size with an alternative representation using something similar to SSO (Short String Optimization); that is small vectors/strings allocate their storage in place while larger ones push their storage to the heap to avoid trashing the stack. Hope this does not look harsh, I sometimes have difficulties expressing my opinions without being seen as patronizing: I can assure you I probably know less than you do :) -- Matthieu ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] Syntax of vectors, slices, etc
On Tue, Apr 24, 2012 at 11:24 PM, Graydon Hoare gray...@mozilla.com wrote: On 12-04-24 11:30 AM, Matthieu Monrocq wrote: However this is at the condition of considering strings as list of codepoints, and not list of bytes. List of bytes are useful in encoding and decoding operations, but to manipulate Arabic or Korean, they fall short: having users manipulate the strings byte-wise instead of codepoint-wise is a recipe to disaster outside of English and Latin-1 representable languages. Could you elaborate on this a little bit? I'm curious to hear impressions -- even if vague or hard to specify -- about the experience of working with known-language, non-Latin-1 text. I'm an English-speaker and much technical material is English-derived, so usually when I'm working with text-processing code, it falls into one of two categories: - ASCII-subset by construction (eg. structured-language keywords) - Totally unknown language semantics, has to work with everything, can't assume I know anything about the language (eg. human input) I am emphatically not saying these are the _only_ two possible environments, just the two that I have experience in. So in my experience byte-operations in ASCII range works for the former and using a proper language-and-locale-aware unicode library like ICU works for the latter. That's where my usability biases emerge in the design of str. In particular I want to know if you would feel that there are common operations you expect to be able to do codepoint-at-a-time on the datatype str, that you would not be comfortable doing on the datatype [char], if you converted str to [char] as a one-time pass in advance of performing the operation. That's what I assume people will do if they need random (rather than sequential) codepoint access. Sequential access we already have iterators for. But I understand this might not be right; it's a design space with a lot of tensions. There are as many different string representations in the world as there are opinionated programmers :) I understand that this may seem contradictory to Rust's original direction of utf-8 encoded strings, but having worked with utf-8 strings using C++ `std::string` I can assure you that apart from blindly passing them around, one cannot do much. All modifiying operations require the use of Unicode aware libraries... even `substr`. Naturally so. We're intending to ship a relatively full binding to libicu for just this reason. Unicode Text Is Hard To Do By Hand. (Though, hmm, substr is actually fine on UTF-8, no? You just have to land on character boundaries. Which are easy to find; O(1) from any given start point -- at most 5 bytes away -- and the guaranteed output of any other algorithm that iterates over character boundaries...) Thanks for this answer: I had not considered the ability to do a str - [char] - str with actual Unicode work on the [char] type. I also did not know about the intent of integrating a subset of libicu. Indeed with a full library handling [char] correctly, and two simple facilities to convert back and fro, then it would be trivial for the user to use real Unicode operations (to_lower / to_upper / capitalize are not fun :x) without too much hassle. Regarding the use cases I have encountered, they were in a general public web app: - wrap-around at a specified length (in number of graphemes, which in the appropriate canonical form was the number of codepoints in all the languages we cared for) - truncation at a specified length (also in number of graphemes) - sorting lists (the first time we presented a list of countries in Greek, it was nigh unusable...) Pretty basic operations, we used ICU for sorting (collation) and conversion to 32bits unicode codepoint value for length operations. It was all the more funny with Arabic, of course, because of the control characters for the direction of display which do not have a graphical representation, but since we counted by hand, we just ignored them. Second, I do not think that statically known sizes are so important in the type system. I am a huge fan, and abuser, of the C++ template system, but I will be the first to admit it is really complex and generally poorly understood even among usually savvy C++ users. As I understand, fixed-length vectors were imagined for C-compatibility. Statically allocated buffers have lifetime that exceed that of all other objects in the system, therefore they can perfectly be accessed through slices. Other uses implying C-compatibility should be based on dynamically allocated memory, and the size will be unknown at compilation. They're useful for a lot of reasons. You can alloca them, which is good for small buffers. And a decent number of heap structures also have need of small fixed-fanout arrays, caches, lookup tables and the like. But beyond that they simply _occur_ in the C type system. With annoying frequency! We've
Re: [rust-dev] Syntax of vectors, slices, etc
Hello, As this is going to be my first e-mail on this list, please do not hesitate to correct me if I speak out of turn. Also do note that I am not a native English speaker, I still promise to do my best and I will gladly welcome any correction. First, I agree that operations on vectors and strings are mostly similar. However this is at the condition of considering strings as list of codepoints, and not list of bytes. List of bytes are useful in encoding and decoding operations, but to manipulate Arabic or Korean, they fall short: having users manipulate the strings byte-wise instead of codepoint-wise is a recipe to disaster outside of English and Latin-1 representable languages. I understand that this may seem contradictory to Rust's original direction of utf-8 encoded strings, but having worked with utf-8 strings using C++ `std::string` I can assure you that apart from blindly passing them around, one cannot do much. All modifiying operations require the use of Unicode aware libraries... even `substr`. Second, I do not think that statically known sizes are so important in the type system. I am a huge fan, and abuser, of the C++ template system, but I will be the first to admit it is really complex and generally poorly understood even among usually savvy C++ users. As I understand, fixed-length vectors were imagined for C-compatibility. Statically allocated buffers have lifetime that exceed that of all other objects in the system, therefore they can perfectly be accessed through slices. Other uses implying C-compatibility should be based on dynamically allocated memory, and the size will be unknown at compilation. In the blog article linked, an issue regarding the variable-size of `rust_vecT` is made because it plays havoc with stack-allocation. However, is real stack-allocation necessary here ? It seems to me that was is desirable is the semantic aspect of a scope-bound variable. Whether the actual representation is instantiated on the stack or on the task heap is an implementation detail, and the compiler could perfectly well be enhanced such that all variably-sized types are actually instantiated on the heap, but automatically collected at the end of the function scope. A parallel stack dedicated to such allocations could even be used, as the allocation/deallocation pattern is stack-like. I hope my suggestions are reasonable. Do feel free to ignore them if they are not! -- Matthieu On Tue, Apr 24, 2012 at 2:06 AM, Niko Matsakis n...@alum.mit.edu wrote: Some more thoughts on the matter: http://smallcultfollowing.com/**babysteps/blog/2012/04/23/** vectors-strings-and-slices/http://smallcultfollowing.com/babysteps/blog/2012/04/23/vectors-strings-and-slices/ Niko On 4/23/12 4:40 PM, Niko Matsakis wrote: One thing that is unclear to me is the utility of the str/N type. I can't think of a case where a *user* might want this type---it seems to me to represent a string of exactly N bytes (not a buffer of at most N bytes). Graydon, did you have use cases in mind? Niko On 4/23/12 4:12 PM, Graydon Hoare wrote: On 12-04-23 03:21 PM, Rick Richardson wrote: Should a str be subject to the same syntax? Because it will have different semantics. I think the semantics are almost identical to vectors. Save the null issue. A UTF-8 string has differently sized characters, so you can't treat it as a vector, there are obvious and currently discussed interoperability issues regarding the null terminator. You certainly can treat it as a (constrained) vector. It's just a byte vector, not a character vector. A character vector is [char]. Indexing into a str gives you a byte. You can iterate through it in terms of bytes or characters (or words, lines, paragraphs, etc.) or convert to characters or utf-16 code units or any other encoding of unicode. It should definitely get a slice syntax, since that will likely be the most common operation on a string. I would also like to support a notion of static sizing, but with UTF-8 even that's not always possible. Yes it is. The static size is a byte count. The compiler knows that size statically and can complain if you get it wrong (or fill it in if you leave it as a wildcard, as I expect most will do.) I reckon a string should be an object, and potentially be convertible to/from a vector. But trying to treat it like a vector will just lead to surprising semantics for some. But that's just my opinion. The set of use-cases to address simultaneously is large and covers much of the same ground as vectors: - Sometimes people want to be able to send strings between tasks. - Sometimes people want a shared, refcounted string. - Sometimes people want strings of arbitrary length. - Sometimes people want an interior string that's part of another structure (with necessarily-fixed size), copied by value. - String literals exist and ought to turn into something useful, something in static memory