On Saturday, 28 May 2016 at 09:43:41 UTC, ZombineDev wrote:
On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu
wrote:
I've been working on RCStr (endearingly pronounced "Our
Sister"), D's up-and-coming reference counted string type. The
goals are:
<Slightly off-topic>
RCStr may be an easier first step, but I think generic dynamic
arrays are more interesting, because are more generally
applicable and user types like move-only resources make them a
more challenging problem to solve.
BTW, what happened to scope? Generally speaking, I'm not a fan
of Rust, and I know that you think that D needs to
differentiate, but I like their borrowing model for several
reasons:
a) while not 100% safe and quite verbose, it offers enough
improvements over @safe D to make it a worthwhile upgrade, if
you don't care about any other language features
b) it's not that hard to grasp / almost natural for people
familiar with C++11's copy (shared_ptr) and move (unique_ptr)
semantics.
3) it's general enough that it can be applied to areas like
iterator invalidation, thread synchronization and other logic
bugs, like some third-party rust packages demonstrate.
I think that improving escape analysis with the scope attribute
can go along way to shortening the gap between Rust and D in
that area.
The other elephant(s) in the room are nested contexts like
delegates, nested structs and some alias template parameter
arguments. These are especially bad because the user has zero
control over those GC allocations. Which makes some of D's key
features unusable in @nogc contexts.
<End off-topic>
* Reference counted, shouldn't leak if all instances
destroyed; even if not, use the GC as a last-resort
reclamation mechanism.
* Entirely @safe.
* Support UTF 100% by means of RCStr!char, RCStr!wchar etc.
but also raw manipulation and custom encodings via
RCStr!ubyte, RCStr!ushort etc.
* Support several views of the same string, e.g. given s of
type RCStr!char, it can be iterated byte-wise, code
point-wise, code unit-wise etc. by using s.by!ubyte,
s.by!char, s.by!dchar etc.
* Support const and immutable qualifiers for the character
type.
* Work well with const and immutable when they qualify the
entire RCStr type.
* Fast: use the small string optimization and various other
layout and algorithms to make it a good choice for high
performance strings
RFC: what primitives should RCStr have?
Thanks,
Andrei
0) (Prerequisite) Composition/interaction with language
features/user types - RCStr in nested contexts (alias template
parameters, delegates, nested structs/classes), array of
RCStr-s, RCStr as a struct/class member, RCStr passed as
(const) ref parameter, etc. should correctly increase/decrease
ref count. This is also a prerequisite for safe RefCounted!T.
Action item: related compiler bugs should be prioritized. E.g.
the RAII bug from
Shachar Shemesh's lightning talk -
http://forum.dlang.org/post/n8algm$qra$1...@digitalmars.com.
See also:
https://issues.dlang.org/buglist.cgi?quicksearch=raii&list_id=208631
https://issues.dlang.org/buglist.cgi?quicksearch=destructor&list_id=208632
(not everything in those lists is related but there are some
nasty ones, like bad RVO codegen).
1) Safe slicing
2) shared overloads of member functions (e.g. for stuff like
atomic incRef/decRef)
3) Concatenation (RCStr ~= RCStr ~ RCStr ~ char)
4) (Optional) Reserving (pre-allocating capacity) / shrinking.
I labeled this feature request as optional, as it's not clear
if RCStr is more like a container, or more like a slice/range.
5) Some sort of optimization for zero-terminated strings. Quite
often one needs to interact with C APIs, which requires calling
toStringz / toUTFz, which causes unnecessary allocations. It
would be great if RCStr could efficiently handle this scenario.
6) !!! Not really a primitive, but we need to make sure that
applying a chain of range transformations won't break ownership
(e.g. leak or free prematurely).
7) Should be able to replace GC usage in transient ranges like
e.g. File.byLine
8) Cheap initialization/assignment from string literals -
should be roughly the same as either initializing a static
character array (if the small string optimization is used) or
just making it point to read-only memory in the data segment of
the executable. It shouldn't try to write or free such memory.
When initialized from a string literal, RCStr should also offer
a null-terminating byte, provided that it points to the whole
If one wants to assign a string literal by overwriting parts of
the already allocated storage, std.algorithm.mutation.copy
should be used instead.
There may be other important primitives which I haven't thought
of, but generally we should try to leverage std.algorithm,
std.range, std.string and std.uni for them, via UFCS.
----------
On a related note, I know that you want to use AffixAllocator
for reference counting, and I think it's a great idea. I have
one question, which wasn't answered during that discussion:
// Use a nightly build to compile
import core.thread : Thread, thread_joinAll;
import std.range : iota;
import std.experimental.allocator : makeArray;
import std.experimental.allocator.building_blocks.region :
InSituRegion;
import
std.experimental.allocator.building_blocks.affix_allocator :
AffixAllocator;
AffixAllocator!(InSituRegion!(4096) , uint) tlsAllocator;
static assert (tlsAllocator.sizeof >= 4096);
import std.stdio;
void main()
{
shared(int)[] myArray;
foreach (i; 0 .. 100)
{
new Thread(
{
if (i != 0) return;
myArray = tlsAllocator.makeArray!(shared
int)(100.iota);
static
assert(is(typeof(&tlsAllocator.prefix(myArray)) ==
shared(uint)*));
writefln("At %x: %s", myArray.ptr, myArray);
}).start();
thread_joinAll();
}
writeln(myArray); // prints garbage!!!
}
So my question is: should it be possible to share thread-local
data like this?
IMO, the current allocator design opens a serious hole in the
type system, because it allows using data allocated from
another thread's thread-local storage. After the other thread
exits, accessing memory allocated from it's TLS should not be
possible, but https://github.com/dlang/phobos/pull/3991 clearly
allows that.
One should be able to allocate shared memory only from shared
allocators. And shared allocators must backed by shared parent
allocators or shared underlying storage. In this case the
Region allocator should be shared, and must be backed by shared
memory, Mallocator, or something in that vein.
Here's another case where the last change to AffixAllocator is
really dangerous:
void main()
{
immutable(int)[] myArray;
foreach (i; 0 .. 100)
{
new Thread(
{
if (i != 0) return;
myArray = tlsAllocator.makeArray!(immutable
int)(100.iota);
writeln(myArray); // prints [0, ..., 99]
}).start();
thread_joinAll(); // prints garbage
}
writeln(myArray);
}
In this case it severely violates the promise of immutable.