Re: Forked GC explained

2022-09-06 Thread Steven Schveighoffer via Digitalmars-d-learn

On 9/6/22 6:31 PM, frame wrote:

Well, of course it would be the fault of the programmer. I did ask this 
because I just want to know if there is any catch of this (probably not 
intended/yet noticed) violation of some third party lib. I don't want do 
debug this :D


You can be confident that if it breaks in the forking GC, it breaks in 
the regular GC as well (and vice versa).


-Steve


Re: Forked GC explained

2022-09-06 Thread frame via Digitalmars-d-learn
On Monday, 5 September 2022 at 18:35:02 UTC, Steven Schveighoffer 
wrote:

On 9/5/22 7:12 AM, frame wrote:
And what if the programmer has no actual reference but wrongly 
forced a `free()` through a pointer cast?


https://dlang.org/spec/garbage.html#pointers_and_gc

* Do not store pointers into non-pointer variables using casts 
and other tricks.

```d
void* p;
...
int x = cast(int)p;   // error: undefined behavior
```
The garbage collector does not scan non-pointer fields for GC 
pointers.


Note that this does not require the forked GC to cause this 
problem.


-Steve


Well, of course it would be the fault of the programmer. I did 
ask this because I just want to know if there is any catch of 
this (probably not intended/yet noticed) violation of some third 
party lib. I don't want do debug this :D


Re: Forked GC explained

2022-09-05 Thread Steven Schveighoffer via Digitalmars-d-learn

On 9/5/22 7:12 AM, frame wrote:
And what if the programmer has no actual reference but wrongly forced a 
`free()` through a pointer cast?


https://dlang.org/spec/garbage.html#pointers_and_gc

* Do not store pointers into non-pointer variables using casts and other 
tricks.

```d
void* p;
...
int x = cast(int)p;   // error: undefined behavior
```
The garbage collector does not scan non-pointer fields for GC pointers.

Note that this does not require the forked GC to cause this problem.

-Steve


Re: Forked GC explained

2022-09-05 Thread frame via Digitalmars-d-learn
On Saturday, 3 September 2022 at 14:31:31 UTC, Steven 
Schveighoffer wrote:

On 9/3/22 9:35 AM, frame wrote:


What happens if a manually `GC.free()` is called while the 
forked process marks the memory as free too but the GC 
immediately uses the memory again and then gets the 
notification to free it from the forked child? Can this happen?


No, because if you can free it, you should have had a reference 
to it when you forked, which should mean it's not garbage.


And what if the programmer has no actual reference but wrongly 
forced a `free()` through a pointer cast?


```
| OP  | Memory M
---
Parent: | -   | Unreferenced, marked in use
---
Parent: | fork
---
Parent: | -   | Unreferenced, marked in use
Child:  | | Unreferenced, marked in use
---
Parent: | -   | Unreferenced, marked in use
Child:  | | Unreferenced, found M
---
Parent: | free| Unreferenced, marked not in use  <- error 
forced by programmer

Child:  | | Unreferenced, found M
---
Parent: | new | Referenced, re-used because it was marked free
Child:  | | Unreferenced, found M
---
Parent: | -   | Referenced, used
Child:  | | Done scanning. Please collect: M
---
Parent: | collect | M
Child:  | | exit
---
```
@wjoe is the GC aware of this to exclude M from the child result 
set because it has changed while the child was running?


There's a talk on it from the 2013 dconf by the inventor: 
https://dconf.org/2013/talks/lucarella.html


-Steve


Thanks for the link. The slides mentioning shared memory.


Re: Forked GC explained

2022-09-03 Thread wjoe via Digitalmars-d-learn

On Saturday, 3 September 2022 at 13:35:39 UTC, frame wrote:
I'm not sure I fully understand how it works. I know that the 
OS creates read only memory pages for both and if a memory 
section is about to be written, the OS will issue a copy of the 
pages so any write operation will be done in it's own copy and 
cannot mess up things.


But then is the question, how can memory be marked as free? The 
forked process cannot since it writes into a copy - how it is 
synchronized then?


Is the GC address root somehow shared between the processes? Or 
does the forked process communicate the memory addresses back 
to the parent?


If so, does the GC just rely on this?

Are freeing GC operations just locked while the forked process 
is running?


What happens if a manually `GC.free()` is called while the 
forked process marks the memory as free too but the GC 
immediately uses the memory again and then gets the 
notification to free it from the forked child? Can this happen?


The OS creates a clone of the process. The original process which 
called fork() is called parent and the clone is called child.
The parent resumes normally after the call to fork returns and 
the child starts the mark phase.
The virtual memory map for both processes are identical at this 
point.
If either process writes to a page, the OS copies the page and 
writes the changes to the copy (Copy On Write).
Hence, modifed pages in the parent process can't be considered 
during the current collection cycle in the child.
At the end of the mark phase the child communicates the result to 
the parent, then exits.
The remaining work can then be completed by the parent in 
parallel as the pause is only required for the mark phase.


This works because every chunk of memory which is unreferenced in 
the parent is in the child, too, because it's a clone which 
doesn't mutate state except for the allocation required to hold 
the marked memory.
There is no need to do anything about the GC in the parent, it 
can allocate/free memory at will.
This doesn't interfere because the chunks that have been marked 
by the child are still considered in use by the parent, but 
unreferenced and ready to be collected.
After the child communicated its result to the parent, the GC 
thread in the parent can complete the collection cycle as if it 
had done the mark phase itself.
Anything that happened in the parent after the call to fork() 
will be considered in the next collection cycle.


Re: Forked GC explained

2022-09-03 Thread Steven Schveighoffer via Digitalmars-d-learn

On 9/3/22 9:35 AM, frame wrote:
I'm not sure I fully understand how it works. I know that the OS creates 
read only memory pages for both and if a memory section is about to be 
written, the OS will issue a copy of the pages so any write operation 
will be done in it's own copy and cannot mess up things.


But then is the question, how can memory be marked as free? The forked 
process cannot since it writes into a copy - how it is synchronized then?


Is the GC address root somehow shared between the processes? Or does the 
forked process communicate the memory addresses back to the parent?


It definitely communicates back to the parent. I'm not sure the 
mechanism, either shared memory or a pipe.


The information communicated back is which blocks can be marked as 
unreferenced, then the sweep is done in the original process.



Are freeing GC operations just locked while the forked process is running?


I'm not sure, but I would think it's possible not to. Only during the 
freeing of the blocks does it need to lock the GC.




What happens if a manually `GC.free()` is called while the forked 
process marks the memory as free too but the GC immediately uses the 
memory again and then gets the notification to free it from the forked 
child? Can this happen?


No, because if you can free it, you should have had a reference to it 
when you forked, which should mean it's not garbage.


There's a talk on it from the 2013 dconf by the inventor: 
https://dconf.org/2013/talks/lucarella.html


-Steve


Forked GC explained

2022-09-03 Thread frame via Digitalmars-d-learn
I'm not sure I fully understand how it works. I know that the OS 
creates read only memory pages for both and if a memory section 
is about to be written, the OS will issue a copy of the pages so 
any write operation will be done in it's own copy and cannot mess 
up things.


But then is the question, how can memory be marked as free? The 
forked process cannot since it writes into a copy - how it is 
synchronized then?


Is the GC address root somehow shared between the processes? Or 
does the forked process communicate the memory addresses back to 
the parent?


If so, does the GC just rely on this?

Are freeing GC operations just locked while the forked process is 
running?


What happens if a manually `GC.free()` is called while the forked 
process marks the memory as free too but the GC immediately uses 
the memory again and then gets the notification to free it from 
the forked child? Can this happen?