On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu
wrote:
I've been working on RCStr (endearingly pronounced "Our
Sister"), D's up-and-coming reference counted string type. The
goals are:
<Slightly off-topic>
RCStr may be an easier first step, but I think generic dynamic
arrays are more interesting, because are more generally
applicable and user types like move-only resources make them a
more challenging problem to solve.
BTW, what happened to scope? Generally speaking, I'm not a fan of
Rust, and I know that you think that D needs to differentiate,
but I like their borrowing model for several reasons:
a) while not 100% safe and quite verbose, it offers enough
improvements over @safe D to make it a worthwhile upgrade, if you
don't care about any other language features
b) it's not that hard to grasp / almost natural for people
familiar with C++11's copy (shared_ptr) and move (unique_ptr)
semantics.
3) it's general enough that it can be applied to areas like
iterator invalidation, thread synchronization and other logic
bugs, like some third-party rust packages demonstrate.
I think that improving escape analysis with the scope attribute
can go along way to shortening the gap between Rust and D in that
area.
The other elephant(s) in the room are nested contexts like
delegates, nested structs and some alias template parameter
arguments. These are especially bad because the user has zero
control over those GC allocations. Which makes some of D's key
features unusable in @nogc contexts.
<End off-topic>
* Reference counted, shouldn't leak if all instances destroyed;
even if not, use the GC as a last-resort reclamation mechanism.
* Entirely @safe.
* Support UTF 100% by means of RCStr!char, RCStr!wchar etc. but
also raw manipulation and custom encodings via RCStr!ubyte,
RCStr!ushort etc.
* Support several views of the same string, e.g. given s of
type RCStr!char, it can be iterated byte-wise, code point-wise,
code unit-wise etc. by using s.by!ubyte, s.by!char, s.by!dchar
etc.
* Support const and immutable qualifiers for the character type.
* Work well with const and immutable when they qualify the
entire RCStr type.
* Fast: use the small string optimization and various other
layout and algorithms to make it a good choice for high
performance strings
RFC: what primitives should RCStr have?
Thanks,
Andrei
0) (Prerequisite) Composition/interaction with language
features/user types - RCStr in nested contexts (alias template
parameters, delegates, nested structs/classes), array of RCStr-s,
RCStr as a struct/class member, RCStr passed as (const) ref
parameter, etc. should correctly increase/decrease ref count.
This is also a prerequisite for safe RefCounted!T.
Action item: related compiler bugs should be prioritized. E.g.
the RAII bug from
Shachar Shemesh's lightning talk -
http://forum.dlang.org/post/n8algm$qra$1...@digitalmars.com.
See also:
https://issues.dlang.org/buglist.cgi?quicksearch=raii&list_id=208631
https://issues.dlang.org/buglist.cgi?quicksearch=destructor&list_id=208632
(not everything in those lists is related but there are some
nasty ones, like bad RVO codegen).
1) Safe slicing
2) shared overloads of member functions (e.g. for stuff like
atomic incRef/decRef)
3) Concatenation (RCStr ~= RCStr ~ RCStr ~ char)
4) (Optional) Reserving (pre-allocating capacity) / shrinking. I
labeled this feature request as optional, as it's not clear if
RCStr is more like a container, or more like a slice/range.
5) Some sort of optimization for zero-terminated strings. Quite
often one needs to interact with C APIs, which requires calling
toStringz / toUTFz, which causes unnecessary allocations. It
would be great if RCStr could efficiently handle this scenario.
6) !!! Not really a primitive, but we need to make sure that
applying a chain of range transformations won't break ownership
(e.g. leak or free prematurely).
7) Should be able to replace GC usage in transient ranges like
e.g. File.byLine
8) Cheap initialization/assignment from string literals - should
be roughly the same as either initializing a static character
array (if the small string optimization is used) or just making
it point to read-only memory in the data segment of the
executable. It shouldn't try to write or free such memory. When
initialized from a string literal, RCStr should also offer a
null-terminating byte, provided that it points to the whole
If one wants to assign a string literal by overwriting parts of
the already allocated storage, std.algorithm.mutation.copy should
be used instead.
There may be other important primitives which I haven't thought
of, but generally we should try to leverage std.algorithm,
std.range, std.string and std.uni for them, via UFCS.
----------
On a related note, I know that you want to use AffixAllocator for
reference counting, and I think it's a great idea. I have one
question, which wasn't answered during that discussion:
// Use a nightly build to compile
import core.thread : Thread, thread_joinAll;
import std.range : iota;
import std.experimental.allocator : makeArray;
import std.experimental.allocator.building_blocks.region :
InSituRegion;
import std.experimental.allocator.building_blocks.affix_allocator
: AffixAllocator;
AffixAllocator!(InSituRegion!(4096) , uint) tlsAllocator;
static assert (tlsAllocator.sizeof >= 4096);
import std.stdio;
void main()
{
shared(int)[] myArray;
foreach (i; 0 .. 100)
{
new Thread(
{
if (i != 0) return;
myArray = tlsAllocator.makeArray!(shared
int)(100.iota);
static
assert(is(typeof(&tlsAllocator.prefix(myArray)) ==
shared(uint)*));
writefln("At %x: %s", myArray.ptr, myArray);
}).start();
thread_joinAll();
}
writeln(myArray); // prints garbage!!!
}
So my question is: should it be possible to share thread-local
data like this?
IMO, the current allocator design opens a serious hole in the
type system, because it allows using data allocated from another
thread's thread-local storage. After the other thread exits,
accessing memory allocated from it's TLS should not be possible,
but https://github.com/dlang/phobos/pull/3991 clearly allows that.
One should be able to allocate shared memory only from shared
allocators. And shared allocators must backed by shared parent
allocators or shared underlying storage. In this case the Region
allocator should be shared, and must be backed by shared memory,
Mallocator, or something in that vein.