Re: std.container.BinaryHeap + refCounted = WTF???

2010-11-29 Thread dsimcha
== Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article
> > 1.  Due to obscure implementation details, it is **not** safe to append
> > to a
> > non-shared array even if you synchronize manually.
> I think you meant __gshared, not non-shared.  It's perfectly safe to
> append to non-shared arrays.
> -Steve

No, I meant appending to non-shared arrays from multiple threads, i.e. __gshared
or passed by reference between threads.


Re: std.container.BinaryHeap + refCounted = WTF???

2010-11-29 Thread Steven Schveighoffer

On Mon, 29 Nov 2010 11:44:23 -0500, dsimcha  wrote:


== Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article

On Mon, 29 Nov 2010 10:58:05 -0500, dsimcha  wrote:
> == Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article
>> On Wed, 17 Nov 2010 12:09:11 -0500, dsimcha   
wrote:

>> > == Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article
>> >> The issue is that if you append to such an array and it adds more
>> pages
>> >> in
>> >> place, the block length location will move.  Since each thread  
caches

>> >> its
>> >> own copy of the block info, one will be wrong and look at array  
data

>> >> thinking it's a length field.
>> >> Even if you surround the appends with a lock, it will still cause
>> >> problems
>> >> because of the cache.  I'm not sure there's any way to reliably
>> append
>> >> to
>> >> such data from multiple threads.
>> >> -Steve
>> >
>> > Would assumeSafeAppend() do the trick?
>> >
>> No, that does not affect your cache.  I probably should add a  
function

>> to
>> append without using the cache.
>> -Steve
>
> How about using std.array.Appender?  This looks safe as far as I can
> tell, but I
> want to make sure there aren't any obscure implementation details that
> would
> prevent this from working, too.
That should be fine, std.array.Appender does not affect the LRU cache.   
In

fact, if you try to do normal appending to an array built with Appender,
it will automatically reallocate because the memory does not have the
APPENDABLE bit set (new GC bit I added to avoid misinterpretations).
This is a good solution.
-Steve


Sounds like a good solution to me, too.  As long as this situation is  
unlikely to
change in the future, I think we should scrap the idea of making  
appending to

__gshared using ~= safe with manual synchronization,


I think it's worth giving people who want low-level access to the runtime  
functions the ability to avoid using the LRU cache.  The cache is meant to  
support everyday coding, but can possibly cause problems if you are  
interested in doing low-level optimizations.



BUT the following needs to be
documented:

1.  Due to obscure implementation details, it is **not** safe to append  
to a

non-shared array even if you synchronize manually.


I think you meant __gshared, not non-shared.  It's perfectly safe to  
append to non-shared arrays.


-Steve


Re: std.container.BinaryHeap + refCounted = WTF???

2010-11-29 Thread dsimcha
== Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article
> On Mon, 29 Nov 2010 10:58:05 -0500, dsimcha  wrote:
> > == Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article
> >> On Wed, 17 Nov 2010 12:09:11 -0500, dsimcha  wrote:
> >> > == Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article
> >> >> The issue is that if you append to such an array and it adds more
> >> pages
> >> >> in
> >> >> place, the block length location will move.  Since each thread caches
> >> >> its
> >> >> own copy of the block info, one will be wrong and look at array data
> >> >> thinking it's a length field.
> >> >> Even if you surround the appends with a lock, it will still cause
> >> >> problems
> >> >> because of the cache.  I'm not sure there's any way to reliably
> >> append
> >> >> to
> >> >> such data from multiple threads.
> >> >> -Steve
> >> >
> >> > Would assumeSafeAppend() do the trick?
> >> >
> >> No, that does not affect your cache.  I probably should add a function
> >> to
> >> append without using the cache.
> >> -Steve
> >
> > How about using std.array.Appender?  This looks safe as far as I can
> > tell, but I
> > want to make sure there aren't any obscure implementation details that
> > would
> > prevent this from working, too.
> That should be fine, std.array.Appender does not affect the LRU cache.  In
> fact, if you try to do normal appending to an array built with Appender,
> it will automatically reallocate because the memory does not have the
> APPENDABLE bit set (new GC bit I added to avoid misinterpretations).
> This is a good solution.
> -Steve

Sounds like a good solution to me, too.  As long as this situation is unlikely 
to
change in the future, I think we should scrap the idea of making appending to
__gshared using ~= safe with manual synchronization, BUT the following needs to 
be
documented:

1.  Due to obscure implementation details, it is **not** safe to append to a
non-shared array even if you synchronize manually.

2.  If you're doing low-level shared-memory cowboy multithreading, Appender is 
the
correct solution.

3.  In general, any data structure, etc. in Phobos/druntime that can't be made
thread-safe by using a synchronized/shared wrapper or manual synchronization due
to obscure implementation details (especially if this would be safe with a
textbook implementation) should come with an **bold** or ALL CAPS explicit 
warning
against doing such things.


Re: std.container.BinaryHeap + refCounted = WTF???

2010-11-29 Thread Steven Schveighoffer

On Mon, 29 Nov 2010 10:58:05 -0500, dsimcha  wrote:


== Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article

On Wed, 17 Nov 2010 12:09:11 -0500, dsimcha  wrote:
> == Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article
>> The issue is that if you append to such an array and it adds more  
pages

>> in
>> place, the block length location will move.  Since each thread caches
>> its
>> own copy of the block info, one will be wrong and look at array data
>> thinking it's a length field.
>> Even if you surround the appends with a lock, it will still cause
>> problems
>> because of the cache.  I'm not sure there's any way to reliably  
append

>> to
>> such data from multiple threads.
>> -Steve
>
> Would assumeSafeAppend() do the trick?
>
No, that does not affect your cache.  I probably should add a function  
to

append without using the cache.
-Steve


How about using std.array.Appender?  This looks safe as far as I can  
tell, but I
want to make sure there aren't any obscure implementation details that  
would

prevent this from working, too.


That should be fine, std.array.Appender does not affect the LRU cache.  In  
fact, if you try to do normal appending to an array built with Appender,  
it will automatically reallocate because the memory does not have the  
APPENDABLE bit set (new GC bit I added to avoid misinterpretations).


This is a good solution.

-Steve


Re: std.container.BinaryHeap + refCounted = WTF???

2010-11-29 Thread dsimcha
== Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article
> On Wed, 17 Nov 2010 12:09:11 -0500, dsimcha  wrote:
> > == Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article
> >> The issue is that if you append to such an array and it adds more pages
> >> in
> >> place, the block length location will move.  Since each thread caches
> >> its
> >> own copy of the block info, one will be wrong and look at array data
> >> thinking it's a length field.
> >> Even if you surround the appends with a lock, it will still cause
> >> problems
> >> because of the cache.  I'm not sure there's any way to reliably append
> >> to
> >> such data from multiple threads.
> >> -Steve
> >
> > Would assumeSafeAppend() do the trick?
> >
> No, that does not affect your cache.  I probably should add a function to
> append without using the cache.
> -Steve

How about using std.array.Appender?  This looks safe as far as I can tell, but I
want to make sure there aren't any obscure implementation details that would
prevent this from working, too.


Re: std.container.BinaryHeap + refCounted = WTF???

2010-11-17 Thread Steven Schveighoffer

On Wed, 17 Nov 2010 13:58:55 -0500, dsimcha  wrote:


== Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article

On Wed, 17 Nov 2010 12:09:11 -0500, dsimcha  wrote:
> == Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article
>> The issue is that if you append to such an array and it adds more  
pages

>> in
>> place, the block length location will move.  Since each thread caches
>> its
>> own copy of the block info, one will be wrong and look at array data
>> thinking it's a length field.
>> Even if you surround the appends with a lock, it will still cause
>> problems
>> because of the cache.  I'm not sure there's any way to reliably  
append

>> to
>> such data from multiple threads.
>> -Steve
>
> Would assumeSafeAppend() do the trick?
>
No, that does not affect your cache.  I probably should add a function  
to

append without using the cache.
-Steve


I thought the whole point of assumeSafeAppend is that it puts the  
current ptr and

length into the cache as-is.


All the cache does is store the block info -- block start, block size, and  
block flags.  The length is stored in the block directly.  The cache  
allows me to skip a call to the GC (and lock the GC's global mutex) by  
getting the block info directly from a small cache.  The block info is  
then used to determine where and how the "used" length is stored.


Since the length is stored at the end, a change in block size in one cache  
while being unchanged in another cache can lead to problems.


assumeSafeAppend sets the "used" block length as the given array's length  
so the block can be used again for appending.  It does not affect the  
cache.


Another option is to go back to the mode where the "used" length is stored  
at the beginning of large blocks (this caused alignment problems for some  
people).


-Steve


Re: std.container.BinaryHeap + refCounted = WTF???

2010-11-17 Thread dsimcha
== Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article
> On Wed, 17 Nov 2010 12:09:11 -0500, dsimcha  wrote:
> > == Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article
> >> The issue is that if you append to such an array and it adds more pages
> >> in
> >> place, the block length location will move.  Since each thread caches
> >> its
> >> own copy of the block info, one will be wrong and look at array data
> >> thinking it's a length field.
> >> Even if you surround the appends with a lock, it will still cause
> >> problems
> >> because of the cache.  I'm not sure there's any way to reliably append
> >> to
> >> such data from multiple threads.
> >> -Steve
> >
> > Would assumeSafeAppend() do the trick?
> >
> No, that does not affect your cache.  I probably should add a function to
> append without using the cache.
> -Steve

I thought the whole point of assumeSafeAppend is that it puts the current ptr 
and
length into the cache as-is.


Re: std.container.BinaryHeap + refCounted = WTF???

2010-11-17 Thread Steven Schveighoffer

On Wed, 17 Nov 2010 12:09:11 -0500, dsimcha  wrote:


== Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article
The issue is that if you append to such an array and it adds more pages  
in
place, the block length location will move.  Since each thread caches  
its

own copy of the block info, one will be wrong and look at array data
thinking it's a length field.
Even if you surround the appends with a lock, it will still cause  
problems
because of the cache.  I'm not sure there's any way to reliably append  
to

such data from multiple threads.
-Steve


Would assumeSafeAppend() do the trick?



No, that does not affect your cache.  I probably should add a function to  
append without using the cache.


-Steve


Re: std.container.BinaryHeap + refCounted = WTF???

2010-11-17 Thread dsimcha
== Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article
> The issue is that if you append to such an array and it adds more pages in
> place, the block length location will move.  Since each thread caches its
> own copy of the block info, one will be wrong and look at array data
> thinking it's a length field.
> Even if you surround the appends with a lock, it will still cause problems
> because of the cache.  I'm not sure there's any way to reliably append to
> such data from multiple threads.
> -Steve

Would assumeSafeAppend() do the trick?



Re: std.container.BinaryHeap + refCounted = WTF???

2010-11-17 Thread Steven Schveighoffer
On Wed, 17 Nov 2010 11:58:20 -0500, Sean Kelly   
wrote:



Steven Schveighoffer Wrote:


There is specific code in array appending that locks a global lock when
appending to shared arrays.  Appending to __gshared arrays from multiple
threads likely will not work in some cases though.  I don't know how to
get around this, since the runtime is not made aware that the data is
shared.


The shared attribute will have to become a part of the TypeInfo, much  
like const is now.  Knowing whether data is shared can affect where/how  
the memory block is allocated by the GC, etc.


shared is part of it, but __gshared is not.

Since __gshared is the hack to allow "bare metal" sharing, I don't see how  
it can be part of the type info.


The issue is that if you append to such an array and it adds more pages in  
place, the block length location will move.  Since each thread caches its  
own copy of the block info, one will be wrong and look at array data  
thinking it's a length field.


Even if you surround the appends with a lock, it will still cause problems  
because of the cache.  I'm not sure there's any way to reliably append to  
such data from multiple threads.


-Steve


Re: std.container.BinaryHeap + refCounted = WTF???

2010-11-17 Thread Sean Kelly
Steven Schveighoffer Wrote:
> 
> There is specific code in array appending that locks a global lock when  
> appending to shared arrays.  Appending to __gshared arrays from multiple  
> threads likely will not work in some cases though.  I don't know how to  
> get around this, since the runtime is not made aware that the data is  
> shared.

The shared attribute will have to become a part of the TypeInfo, much like 
const is now.  Knowing whether data is shared can affect where/how the memory 
block is allocated by the GC, etc.


Re: std.container.BinaryHeap + refCounted = WTF???

2010-11-17 Thread Steven Schveighoffer

On Wed, 17 Nov 2010 10:14:21 -0500, dsimcha  wrote:


== Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article

I think that we need a wrapper for containers that implements the shared
methods required and manually locks things in order to use them.  Then  
you
apply this wrapper to any container type, and it's now a shared  
container.
There are also certain types of containers that lend themselves to  
shared

access.  For example, I can see a linked list where each node contains a
lock being a useful type.


This is a good idea to some degree, but the thing is that it forces you  
to use
shared even when you're going for fine-grained parallelism and want to  
just cowboy
it.  For fine-grained parallelism use cases, my hunch is that cowboying  
is going

to be the only game in town for a long time in all languages, not just D.


There is always the possibility of "cowboy"ing it.  But I don't see that  
the standard lib should be catering to this.




> (For arrays, I'm referring to the appending issue, which is  
problematic

> when I try
> to append to an array from multiple threads, synchronizing manually.)
I'm interested what you mean here, I tried to make sure cross-thread
appending is possible.
>> dcollections containers would probably all fail if you tried to use  
them

>>  from multiple threads.


Ok, I stand corrected.  It seemed to work in practice, but always I just  
assumed

that it was a Bad Thing to do and worked for the Wrong Reasons.


There is specific code in array appending that locks a global lock when  
appending to shared arrays.  Appending to __gshared arrays from multiple  
threads likely will not work in some cases though.  I don't know how to  
get around this, since the runtime is not made aware that the data is  
shared.



memory (a relatively abundant resource).


Apparently you've never tired working with multigigabyte datasets using a
conservative garbage collector and 32-bit address space.


Is that supported by out-of-the-box containers?  I would expect you need  
to create a special data structure to deal with such things.


And no, I don't regularly work with such issues.  But my point is,  
reference counting the *container* which uses some sort of memory  
allocation to implement its innards is not coupling a limited resource to  
memory allocation/deallocation.  In other words, I think it's better to  
have the container be a non-reference counted type, even if you  
reference-count the elements.  I prefer class semantics to be quite  
honest, where explicit initialization is required.


-Steve


Re: std.container.BinaryHeap + refCounted = WTF???

2010-11-17 Thread dsimcha
== Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article
> I think that we need a wrapper for containers that implements the shared
> methods required and manually locks things in order to use them.  Then you
> apply this wrapper to any container type, and it's now a shared container.
> There are also certain types of containers that lend themselves to shared
> access.  For example, I can see a linked list where each node contains a
> lock being a useful type.

This is a good idea to some degree, but the thing is that it forces you to use
shared even when you're going for fine-grained parallelism and want to just 
cowboy
it.  For fine-grained parallelism use cases, my hunch is that cowboying is going
to be the only game in town for a long time in all languages, not just D.

> > (For arrays, I'm referring to the appending issue, which is problematic
> > when I try
> > to append to an array from multiple threads, synchronizing manually.)
> I'm interested what you mean here, I tried to make sure cross-thread
> appending is possible.
> >> dcollections containers would probably all fail if you tried to use them
> >>  from multiple threads.

Ok, I stand corrected.  It seemed to work in practice, but always I just assumed
that it was a Bad Thing to do and worked for the Wrong Reasons.

> memory (a relatively abundant resource).

Apparently you've never tired working with multigigabyte datasets using a
conservative garbage collector and 32-bit address space.




Re: std.container.BinaryHeap + refCounted = WTF???

2010-11-17 Thread Steven Schveighoffer

On Wed, 17 Nov 2010 09:17:05 -0500, dsimcha  wrote:


== Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article

I think in general containers don't work across multiple threads unless
specifically designed to do that.


I'm making the assumption that you'd handle all the synchronization  
issues
yourself.  When you need to update the container, there's an obvious  
issue.  In
general I don't like the trend in D of building things into arrays,  
containers,
etc. such that they can't be shared across threads due to obscure  
implementation

details even when it looks safe.


I think that we need a wrapper for containers that implements the shared  
methods required and manually locks things in order to use them.  Then you  
apply this wrapper to any container type, and it's now a shared container.


There are also certain types of containers that lend themselves to shared  
access.  For example, I can see a linked list where each node contains a  
lock being a useful type.


(For arrays, I'm referring to the appending issue, which is problematic  
when I try

to append to an array from multiple threads, synchronizing manually.)


I'm interested what you mean here, I tried to make sure cross-thread  
appending is possible.



dcollections containers would probably all fail if you tried to use them
 from multiple threads.
That being said, I'm not a huge fan of reference counting period.
Containers have no business being reference counted anyways, since their
resource is memory, and should be handled by the GC.  This doesn't mean
pieces of it shouldn't be reference counted or allocated via malloc or
whatever, but the object itself's lifetime should be managed by the GC
IMO.  Not coincidentally, this is how dcollections is set up.


I think reference counting is an elegant solution to a niche problem  
(namely,
memory management of large containers that won't have circular  
references when
memory is tight), but given all the baggage it creates, I don't think it  
should be
the default for any container.  I think we need to start thinking about  
custom

allocators, and allow reference counting but make GC the default.


The memory management of a container's innards is open to reference  
counting (or whatever, I agree that allocators should be supported  
somewhere).  I just object to reference counting of the container itself,  
as it's not important to me whether a container gets closed outside the GC  
automatically.


With something like a File it's different since you are coupling closing a  
file (a very limited resource) with cleaning memory (a relatively abundant  
resource).


-Steve


Re: std.container.BinaryHeap + refCounted = WTF???

2010-11-17 Thread dsimcha
== Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article
> I think in general containers don't work across multiple threads unless
> specifically designed to do that.

I'm making the assumption that you'd handle all the synchronization issues
yourself.  When you need to update the container, there's an obvious issue.  In
general I don't like the trend in D of building things into arrays, containers,
etc. such that they can't be shared across threads due to obscure implementation
details even when it looks safe.

(For arrays, I'm referring to the appending issue, which is problematic when I 
try
to append to an array from multiple threads, synchronizing manually.)

> dcollections containers would probably all fail if you tried to use them
>  from multiple threads.
> That being said, I'm not a huge fan of reference counting period.
> Containers have no business being reference counted anyways, since their
> resource is memory, and should be handled by the GC.  This doesn't mean
> pieces of it shouldn't be reference counted or allocated via malloc or
> whatever, but the object itself's lifetime should be managed by the GC
> IMO.  Not coincidentally, this is how dcollections is set up.

I think reference counting is an elegant solution to a niche problem (namely,
memory management of large containers that won't have circular references when
memory is tight), but given all the baggage it creates, I don't think it should 
be
the default for any container.  I think we need to start thinking about custom
allocators, and allow reference counting but make GC the default.


Re: std.container.BinaryHeap + refCounted = WTF???

2010-11-16 Thread Steven Schveighoffer

On Tue, 16 Nov 2010 15:47:07 -0500, dsimcha  wrote:


I'm trying to use BinaryHeap in a multithreaded programming using
std.parallelism/parallelfuture.  I kept getting ridiculous segfaults and
stuff, and when I looked into it in a debugger, I realized the crashes  
had to

do with reference counting, probably caused by this line in BinaryHeap:

private RefCounted!(Tuple!(Store, "_store", size_t, "_length"),
   RefCountedAutoInitialize.no) _payload;

Why the heck are the payload and the length being stored in such a weird  
way?
 IMHO BinaryHeap shouldn't use reference counting unless Store does.   
More
generally, anything that uses reference counting, especially where  
unexpected,
should come with a very strong warning in its ddoc so that people are  
aware of
the hidden caveats, like copying it using memcpy() and sharing it across  
threads.


I think in general containers don't work across multiple threads unless  
specifically designed to do that.


dcollections containers would probably all fail if you tried to use them  
from multiple threads.


That being said, I'm not a huge fan of reference counting period.   
Containers have no business being reference counted anyways, since their  
resource is memory, and should be handled by the GC.  This doesn't mean  
pieces of it shouldn't be reference counted or allocated via malloc or  
whatever, but the object itself's lifetime should be managed by the GC  
IMO.  Not coincidentally, this is how dcollections is set up.


-Steve


Re: std.container.BinaryHeap + refCounted = WTF???

2010-11-16 Thread Michel Fortin

On 2010-11-16 15:47:07 -0500, dsimcha  said:


I'm trying to use BinaryHeap in a multithreaded programming using
std.parallelism/parallelfuture.  I kept getting ridiculous segfaults and
stuff, and when I looked into it in a debugger, I realized the crashes had to
do with reference counting, probably caused by this line in BinaryHeap:

private RefCounted!(Tuple!(Store, "_store", size_t, "_length"),
   RefCountedAutoInitialize.no) _payload;

Why the heck are the payload and the length being stored in such a weird way?
 IMHO BinaryHeap shouldn't use reference counting unless Store does.  More
generally, anything that uses reference counting, especially where unexpected,
should come with a very strong warning in its ddoc so that people are aware of
the hidden caveats, like copying it using memcpy() and sharing it 
across threads.


RefCounted, and many other things in Phobos, aren't thread-safe because 
of those reference counters. This problem can occur even if you don't 
share it with other threads, as the GC may destroy objects in another 
thread. See bug 4624:



Also bug 4621 about the compiler not catching the problem at 
compile-time (as it should catch low-level races):



I suggest you try to make RefCounted use an atomic reference counter to 
see if it removes your segfaults.


Andrei acknowledged the issue recently (Walter too, I think), so I 
trust it'll be fixed at some point. One idea was to change the GC so it 
knows in which thread it should call destructors...


--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/



Re: std.container.BinaryHeap + refCounted = WTF???

2010-11-16 Thread Andrei Alexandrescu

On 11/16/10 12:47 PM, dsimcha wrote:

I'm trying to use BinaryHeap in a multithreaded programming using
std.parallelism/parallelfuture.  I kept getting ridiculous segfaults and
stuff, and when I looked into it in a debugger, I realized the crashes had to
do with reference counting, probably caused by this line in BinaryHeap:

 private RefCounted!(Tuple!(Store, "_store", size_t, "_length"),
RefCountedAutoInitialize.no) _payload;

Why the heck are the payload and the length being stored in such a weird way?
  IMHO BinaryHeap shouldn't use reference counting unless Store does.  More
generally, anything that uses reference counting, especially where unexpected,
should come with a very strong warning in its ddoc so that people are aware of
the hidden caveats, like copying it using memcpy() and sharing it across 
threads.


BinaryHeap being a container it has reference semantics. In order to 
also be sealed, it needs to be reference counted. The fact that the 
store uses itself reference counting is not relevant because BinaryHeap 
has two fields.


Andrei