On 12/12/14 7:52 AM, Ruslan Mullakhmetov wrote:
On Thursday, 11 December 2014 at 18:36:59 UTC, Steven Schveighoffer wrote:
My analysis so far:

2. In the array append code, the block attributes are obtained via
GC.query, which has this code for getting the attributes:

https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L1792


Quoting from that function:

// reset the offset to the base pointer, otherwise the bits
// are the bits for the pointer, which may be garbage
offset = cast(size_t)(info.base - pool.baseAddr);
info.attr = getBits(pool, cast(size_t)(offset >> pool.shiftBy));

Which should get the correct bits. I suspected there was an issue with
getting the wrong bits, but this code looks correct.

3. The runtime caches the block info for thread local data for append
speed. A potential issue is that the attributes are cached from a
previous use for that block, but the GC (and the runtime itself)
SHOULD clear that cache entry when that block is freed, avoiding this
issue. A potential way to check this is to assert in a debug build of
druntime that the cached block info always equals the actual block
info. Are you able to build a debug version of druntime to test this?
I can give you the changes you should make. This would explain the
great difficulty in reproducing the issue.

I will try to build debug version of dmd compiler and check the issue.

A debug version of compiler is not necessary, not even a debug version of phobos, just druntime. But it's not going to matter yet, because I need to give you the asserts to put in there. I just wanted to know if you needed help doing it.



4. If your code is multi-threaded, but using __gshared, it can make
the cache incorrect. Are you doing this?


the app is multi-threaded via std.concurrency.

This should be OK, you should not be able to share data that is not marked as shared.

there is only one known to me place where __gshared is used: logging
library (checked by searching through whole source tree). make stub for
this lib and try, so identify whether cache invalidated by _gshared or not.

Here is where it might occur:

1. Due to shared data having typeinfo attached to it that it is actually shared, the runtime takes advantage of that. We can use a lock-free cache that is thread-local for anything not marked as shared, because nothing outside the thread can access that data. 2. __gshared gets around this because it is not marked as shared by the compiler. This means, if you, for instance, appended to a __gshared array, the runtime would treat it like a thread-local array. If you did this from multiple threads, the cache may be invalid in one or more of them. 3. Actual 'shared' arrays are not permitted to use the cache, so they should not have this issue.

I see that you removed the only instance of __gshared and it did not help. That at least rules that out.

But the cache is really the only possible place I can see where the
bits are set incorrectly, given that you just verified the bits are
correct before the append.

Can you just list the version of the compiler you are using? I want to
make sure this isn't an issue that has already been fixed.

the last. first of all i updated whole toolchain (dmd, dub).

$ dmd
DMD64 D Compiler v2.066.1

Thanks, this at least gives me a baseline to know what to test and debug with. I do not believe the code has had any significant fixes that would help with this issue since then.


I started looking druntime and dmd source code myself before i checked
the thread (thsnks for your help and feedback) and i have some
questions. could you explain to me something?

i_m looking here
https://github.com/D-Programming-Language/druntime/blob/v2.066.1/src/rt/lifetime.d#L591


-------
line #603
auto size = ti.next.tsize;

why `next`? it can be even null if this is last TypeInfo in the linked
list.

This is the way the compiler constructs the type info. The first TypeInfo is always TypeInfo_Array (or TypeInfo_Shared, or const or whatever), and the .next is the typeinfo for the element type. all this does is get the size of an element. Since we know we are dealing with an array, we know next is always valid.


btw, i used suggested trackallocs.d and GC defenetely receives NO_SCAN

before tag: 1 len: 2 ptr: 103A78058 root: 103A77000:8192 attr: APPENDABLE
gc_qalloc(41, NO_SCAN APPENDABLE ) cc: 29106 asz: 10152603, ti: null
ret: BlkInfo_(104423800, 64, 10)
after tag: 1 len: 3 ptr: 104423810 root: 104423800:64 attr: NO_SCAN
APPENDABLE

This is good information, thanks. I will get back to you with a druntime branch to try. Can I email you at this address? If not, email me at the address from my post to let me know your contact, no reason to work through building issues on the public forum :)

-Steve

Reply via email to