Re: How to debug (potential) GC bugs?

2016-10-07 Thread Johannes Pfau via Digitalmars-d-learn
Am Sun, 25 Sep 2016 16:23:11 +
schrieb Matthias Klumpp :

> Hello!
> I am working together with others on the D-based 
> appstream-generator[1] project, which is generating software 
> metadata for "software centers" and other package-manager 
> functionality on Linux distributions, and is used by default on 
> Debian, Ubuntu and Arch Linux.
> 
> For Ubuntu, some modifications on the code were needed, and 
> apparently for them the code is currently crashing in the GC 
> collection thread: http://paste.debian.net/840490/
> 
> The project is running a lot of stuff in parallel and is using 
> the GC (if the extraction is a few seconds slower due to the GC 
> being active, it doesn't matter much).
> 
> [...]
> 
> 2) How can one debug issues like the one mentioned above 
> properly? Since it seems to happen in the GC and doesn't give me 
> information on where to start searching for the issue, I am a bit 
> lost.
> 

Can you get the GDC & LDC phobos versions? 

We added shared library support in 2.068 which replaced much of
GDC-specific backported GC/TLS code with the standard upstream
implementation. So using a recent 2.068 GDC could help.

Judging from the stack trace you're probably using a 2.067 phobos:
https://github.com/D-Programming-GDC/GDC/blob/722cf5670d927ef6182bf1b72765a64ca0fde693/libphobos/libdruntime/rt/lifetime.d#L1423



Here's some advice for debugging such a problem:
The memory layout is usually deterministic when restarting the app in
gdb with the run command. So you can do this:

gdb app
# run
# SIGSEGV in 
# bt
Then get the value of p when the app crashed, in the posted stack trace
0x7fdfae368000
# break rt_finalize2 if p = 0x7fdfae368000
# run
Should now break whenever the object is collected, so you can check if
it is collected twice. You can also use next to step until you get the
classinfo in c and then print the classinfo contents: print c 

You can also use write breakpoints to find data corruption:
find the value of pc:
# break lifetime.d:1418 if p = 0x7fdfae368000
# run
# print ppv
# watch -l pc
# or watch * (value of ppv)

then disable the old breakpoint & run from start
# disable 1
# run

This should now break when data is written to the location.

(The commands might not be 100% correct ;-)


Re: How to debug (potential) GC bugs?

2016-10-07 Thread Martin Nowak via Digitalmars-d-learn
On Saturday, 1 October 2016 at 00:06:05 UTC, Matthias Klumpp 
wrote:

So, this problem is:
 A) A compiler / DRuntime bug, or
 B) A bug in my code (not) triggered by a certain compiler / 
DRuntime


We actually did change druntime recently to no longer fail when 
using GC.free from a finalizer (will get ignored now). Maybe 
that's what fixed it for you w/ a newer version, but at a quick 
glance I haven't seen any freeing code in destructors.


Re: How to debug (potential) GC bugs?

2016-10-07 Thread Martin Nowak via Digitalmars-d-learn

On Tuesday, 4 October 2016 at 08:14:37 UTC, Ilya Yaroshenko wrote:
Probably related issue: 
https://issues.dlang.org/show_bug.cgi?id=15939


Crashes in a finalizer, likely not related to the dead-lock bug.



Re: How to debug (potential) GC bugs?

2016-10-04 Thread Ilya Yaroshenko via Digitalmars-d-learn
On Sunday, 25 September 2016 at 16:23:11 UTC, Matthias Klumpp 
wrote:

Hello!
I am working together with others on the D-based 
appstream-generator[1] project, which is generating software 
metadata for "software centers" and other package-manager 
functionality on Linux distributions, and is used by default on 
Debian, Ubuntu and Arch Linux.


[...]


Probably related issue: 
https://issues.dlang.org/show_bug.cgi?id=15939


Re: How to debug (potential) GC bugs?

2016-10-03 Thread Kagamin via Digitalmars-d-learn
If it's heap corruption, GC has debugging option -debug=SENTINEL 
- for buffer overrun checks. Also that particular stack trace 
shows that object being destroyed is allocated in bin 512, i.e. 
its size is between 256 and 512 bytes.


Re: How to debug (potential) GC bugs?

2016-10-03 Thread Kagamin via Digitalmars-d-learn
On Sunday, 25 September 2016 at 16:23:11 UTC, Matthias Klumpp 
wrote:
For Ubuntu, some modifications on the code were needed, and 
apparently for them the code is currently crashing in the GC 
collection thread: http://paste.debian.net/840490/


Oh, wait, what do you mean by crashing?


Re: How to debug (potential) GC bugs?

2016-10-03 Thread Kagamin via Digitalmars-d-learn
On Saturday, 1 October 2016 at 00:06:05 UTC, Matthias Klumpp 
wrote:

I do none of those things in my code though...


`grep "~this" *.d` gives nothing? It can be a struct with 
destructor stored in a class. Can you observe the error? Try to 
set a breakpoint at onInvalidMemoryOperationError 
https://github.com/dlang/druntime/blob/master/src/core/exception.d#L559 and see what stack leads to it.


Unfortunately for having deterministic memory management, I 
would essentially need to develop GC-less, and would loose 
classes. This means many nice features of D aren't available, 
e.g. I couldn't use interfaces (AFAIK they don't work on 
structs) or constraints.


Not necessarily. You only need to dispose the resources in time, 
like in C#. But if you don't have destructors, you have nothing 
to dispose.


Strangely after switching from the GDC compiler to the LDC 
compiler, all crashes observed at Ubuntu are gone.


Sounds not good.


Re: How to debug (potential) GC bugs?

2016-09-29 Thread Kagamin via Digitalmars-d-learn
Does it crash only in rt_finalize2? It calls the class 
destructor, and the destructor must not allocate or touch GC in 
any way because the GC doesn't yet support allocation during 
collection.


Re: How to debug (potential) GC bugs?

2016-09-27 Thread Kapps via Digitalmars-d-learn
On Sunday, 25 September 2016 at 16:23:11 UTC, Matthias Klumpp 
wrote:

Hello!
I am working together with others on the D-based 
appstream-generator[1] project, which is generating software 
metadata for "software centers" and other package-manager 
functionality on Linux distributions, and is used by default on 
Debian, Ubuntu and Arch Linux.


For Ubuntu, some modifications on the code were needed, and 
apparently for them the code is currently crashing in the GC 
collection thread: http://paste.debian.net/840490/


The project is running a lot of stuff in parallel and is using 
the GC (if the extraction is a few seconds slower due to the GC 
being active, it doesn't matter much).


We also link against a lot of 3rd-party libraries and use a big 
amount of existing C code in the project.


So, I would like to know the following things:

1) Is there any caveat when linking to C libraries and using 
the GC in a project? So far, it seems to be working well, but 
there have been a few cases where I was suspicious about the GC 
actually doing something to malloc'ed stuff or C structs 
present in the bindings.


2) How can one debug issues like the one mentioned above 
properly? Since it seems to happen in the GC and doesn't give 
me information on where to start searching for the issue, I am 
a bit lost.


3) The tool seems to leak memory somewhere and OOMs pretty 
quickly on some machines. All the stuff using C code frees 
resources properly though, and using Valgrind on the project is 
a pain due to large amounts of data being mmapped. I worked 
around this a while back, but then the GC interfered with 
Valgrind, making information less useful. Is there any 
information on how to find memory leaks, or e.g. large structs 
the GC cannot free because something is still having a needless 
reference on it?


Unfortunately I can't reproduce the crash from 2) myself, it 
only seems to happen at Ubuntu (but Ubuntu is using some 
different codepaths too).


Any insights would be highly appreciated!
Cheers,
   Matthias

[1[: https://github.com/ximion/appstream-generator


First, make sure any C threads calling D code use 
Thread.attachThis (thread_attachThis maybe?). Otherwise the GC 
will not suspend those threads during a collection which will 
cause crashes. I'd guess this is your issue.


Second, tell the GC of non-GC memory that has pointers to GC 
memory by using GC.addRange / GC.addRoot as needed. Make sure to 
remove them once the non-GC memory is deallocated as well, 
otherwise you'll get memory leaks. The GC collector is also 
conservative, not precise, so false positives are possible. If 
you're using 64 bit programs, this shouldn't be much of an issue 
though.


Finally, make sure you're not doing any GC allocations in dtors.


Re: How to debug (potential) GC bugs?

2016-09-27 Thread Guillaume Piolat via Digitalmars-d-learn
On Sunday, 25 September 2016 at 16:23:11 UTC, Matthias Klumpp 
wrote:

Hello!
I am working together with others on the D-based 
appstream-generator[1] project, which is generating software 
metadata for "software centers" and other package-manager 
functionality on Linux distributions, and is used by default on 
Debian, Ubuntu and Arch Linux.


For Ubuntu, some modifications on the code were needed, and 
apparently for them the code is currently crashing in the GC 
collection thread: http://paste.debian.net/840490/


The project is running a lot of stuff in parallel and is using 
the GC (if the extraction is a few seconds slower due to the GC 
being active, it doesn't matter much).


We also link against a lot of 3rd-party libraries and use a big 
amount of existing C code in the project.


So, I would like to know the following things:




1) Is there any caveat when linking to C libraries and using 
the GC in a project? So far, it seems to be working well, but 
there have been a few cases where I was suspicious about the GC 
actually doing something to malloc'ed stuff or C structs 
present in the bindings.


There is no way the GC scans memory allocated with malloc (unless 
you tell it to) or used in the bindings.
A caveat is that if you are called from C (not your case), you 
must initialize the runtime, and attach/detach threads.


The GC could well stop threads that are currently in the C code 
if they were registered to the runtime.



2) How can one debug issues like the one mentioned above 
properly? Since it seems to happen in the GC and doesn't give 
me information on where to start searching for the issue, I am 
a bit lost.


There can be multiple reasons.
  - The GC is collecting some object that is unreachable from its 
POV; when you are actually using it.
  - The GC is calling destructors, that should not be called by 
the GC. Performing illegal operations. usually this is solved by 
using deterministic destruction instead and never relying on a 
destructor called by the GC.
  - The GC tries to stop threads that don't exist anymore or are 
not interruptible


My advice is to have a fuly deterministic tree of objects, like a 
C++ program, and Google for "GC-proof resource class" in case you 
are using classes.





Re: How to debug (potential) GC bugs?

2016-09-27 Thread Marco Leise via Digitalmars-d-learn
Am Sun, 25 Sep 2016 16:23:11 +
schrieb Matthias Klumpp :

> So, I would like to know the following things:
> 
> 1) Is there any caveat when linking to C libraries and using the 
> GC in a project? So far, it seems to be working well, but there 
> have been a few cases where I was suspicious about the GC 
> actually doing something to malloc'ed stuff or C structs present 
> in the bindings.

If you pass callbacks into the C code, make sure they never
throw. Stack unwinding and exception handling generally
doesn't work across language boundaries.

A tracing garbage collector starts with the assumption that
all the memory that it allocated is no longer reachable and
then starts scanning the known memory for any pointers to
allocations that falsify this assumption.
What you malloc'ed is unknown to the GC and wont be scanned.
Should you ever have GC memory pointers in your malloc'ed
stuff, then you need to call GC.addRange() to make those
pointers keep the allocations alive. Otherwise you will get a
"used after free" error: data corruption or access violations.
A simple case would be a string that you constructed in D and
store in C as a pointer. The GC can automatically scan the
stack and any globals/statics on the D side, but that's about
it.

I know of no tools similar to valgrind specially designed to
debug the D GC. You can plug into the GC API and keep track of
the allocation sizes. I.e. write a proxy GC.

-- 
Marco