Re: GSOC Proposal

nick Mon, 08 Apr 2019 06:19:17 -0700

On 2019-04-08 3:29 a.m., Richard Biener wrote:
> On Sun, 7 Apr 2019, nick wrote:
> 
>>
>>
>> On 2019-04-07 5:31 a.m., Richard Biener wrote:
>>> On April 5, 2019 6:11:15 PM GMT+02:00, nick <xerofo...@gmail.com> wrote:
>>>>
>>>>
>>>> On 2019-04-05 6:25 a.m., Richard Biener wrote:
>>>>> On Wed, 3 Apr 2019, nick wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On 2019-04-03 7:30 a.m., Richard Biener wrote:
>>>>>>> On Mon, 1 Apr 2019, nick wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2019-04-01 9:47 a.m., Richard Biener wrote:
>>>>>>>>> On Mon, 1 Apr 2019, nick wrote:
>>>>>>>>>
>>>>>>>>>> Well I'm talking about the shared roots of this garbage
>>>> collector core state 
>>>>>>>>>> data structure or just struct ggc_root_tab.
>>>>>>>>>>
>>>>>>>>>> But also this seems that this to be no longer shared globally if
>>>> I'm not mistaken 
>>>>>>>>>> or this:
>>>>>>>>>> static vec<const_ggc_root_tab_t> extra_root_vec;
>>>>>>>>>>
>>>>>>>>>> Not sure after reading the code which is a bigger deal through
>>>> so I wrote
>>>>>>>>>> my proposal not just asking which is a better issue for not
>>>> being thread
>>>>>>>>>> safe. Sorry about that.
>>>>>>>>>>
>>>>>>>>>> As for the second question injection seems to not be the issue
>>>> or outside
>>>>>>>>>> callers but just internal so phase 3 or step 3 would now be:
>>>>>>>>>> Find internal callers or users of x where x is one of the above
>>>> rather
>>>>>>>>>> than injecting outside callers. Which answers my second question
>>>> about
>>>>>>>>>> external callers being a issue still.
>>>>>>>>>>
>>>>>>>>>> Let me know which  of the two is a better issue:
>>>>>>>>>> 1. struct ggc_root_tabs being shared
>>>>>>>>>> 2.static vec<const_ggc_root_tab_t> extra_root_vec; as a shared
>>>> heap or
>>>>>>>>>> vector of root nodes for each type of allocation
>>>>>>>>>>
>>>>>>>>>> and I will gladly rewrite my proposal sections for that
>>>>>>>>>> as needs to be reedited.
>>>>>>>>>
>>>>>>>>> I don't think working on the garbage collector as a separate
>>>>>>>>> GSoC project is useful at this point.  Doing locking around
>>>>>>>>> allocation seems like a good short-term solution and if that
>>>>>>>>> turns out to be a performance issue for the threaded part
>>>>>>>>> using per-thread freelists is likely an easy to deploy
>>>>>>>>> solution.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>> I agree but we were discussing this:
>>>>>>>> Or maybe a project to be more
>>>>>>>> explicit about regions of the code that assume that the garbage-
>>>>>>>> collector can't run within them?[3] (since the GC is state that
>>>> would
>>>>>>>> be shared by the threads).
>>>>>>>
>>>>>>> The process of collecting garbage is not the only issue (and that
>>>>>>> very issue is easiest mitigated by collecting only at specific
>>>>>>> points - which is what we do - and have those be serializing
>>>> points).
>>>>>>> The main issue is the underlying memory allocator (GCC uses memory
>>>>>>> that is garbage collected plus regular heap memory).
>>>>>>>
>>>>>>>> In addition I moved my paper back to our discussion about garbage
>>>> collector
>>>>>>>> state with outside callers.Seems we really need to do something
>>>> about
>>>>>>>> my wording as the idea of my project in a nutshell was to figure
>>>>>>>> out how to mark shared state by callers and inject it into the
>>>>>>>> garbage collector letting it known that the state was not shared
>>>> between
>>>>>>>> threads or shared. Seems that was on the GSoc page and in our
>>>> discussions the issue
>>>>>>>> is marking outside code for shared state. If that's correct then
>>>> my
>>>>>>>> wording of outside callers is incorrect it should have been shared
>>>>>>>> state between threads on outside callers to the garbage collector.
>>>>>>>> If the state is that in your wording above then great as I
>>>> understand
>>>>>>>> where we are going and will gladly change my wording.
>>>>>>>
>>>>>>> I'm still not sure what you are shooting at, the above sentences do
>>>>>>> not make any sense to me.
>>>>>>>
>>>>>>>> Also freelists don't work here as the state is shared at the
>>>> caller's 
>>>>>>>> end which would need two major issues:
>>>>>>>> 1. Locking on nodes of the 
>>>>>>>> freelists when two threads allocate at the same thing which can be
>>>> a 
>>>>>>>> problem if the shared state is shared a lot
>>>>>>>> 2. Locking allocation with 
>>>>>>>> large numbers of callers can starve threads
>>>>>>>
>>>>>>> First of all allocating memory from the GC pool is not the main
>>>>>>> work of GIMPLE passes so simply serializing at allocation time
>>>> might
>>>>>>> work out.  Second free lists of course do work.  What you'd do is
>>>>>>> have a fast path in allocation using a thread-local "free list"
>>>>>>> which you can allocate from without taking any lock.  Maybe I
>>>> should
>>>>>>> explain "free list" since that term doesn't make too much sense in
>>>>>>> a garbage collector world.  What I'd do is when a client thread
>>>>>>> asks for memory of size N allocate M objects of that size but put
>>>>>>> M - 1 on the client thread local "free list" to be allocated
>>>> lock-free
>>>>>>> from for the next M - 1 calls.  Note that garbage collected memory
>>>>>>> objects are only handed out in fixed chunks (powers of two plus
>>>>>>> a few special sizes) so you'd have one "free list" per chunk size
>>>>>>> per thread.
>>>>>>>
>>>>>>> The collection itself (mark & sweep) would be fully serialized
>>>> still
>>>>>>> (and not return to any threads local "free list").
>>>>>>>
>>>>>>> ggc_free'd objects _might_ go to the threads "free list"s (yeah, we
>>>>>>> _do_ have ggc_free ...).
>>>>>>>
>>>>>>> As said, I don't see GC or the memory allocator as sth interesting
>>>>>>> to work on for parallelization until the basic setup works and it
>>>>>>> proves to be a bottleneck.
>>>>>>>
>>>>>>>> Seems that working on the garbage collector itself isn't the issue
>>>> but 
>>>>>>>> the callers as I just figured out as related to your state idea.
>>>> Let me 
>>>>>>>> know if that's correct and if the wording change I mentioned is
>>>> fine 
>>>>>>>> with you as that's the state it seems that needs to be changed.
>>>>>>>> Nick 
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>
>>>>>> That's fine and it's my fault for not understanding you better. I
>>>> was aware 
>>>>>> of the expand_functions_all being taken for passes.c. However it
>>>> seems
>>>>>> two other issues are these sets as related to threads:
>>>>>> 1.finalize_compilation_unit
>>>>>> 2.and the ipa set of pass functions
>>>>>>
>>>>>> If I'm understanding it correctly number 1 seems to be a early
>>>> version of
>>>>>> expand_all_functions for the GENERIC representation if that's the
>>>> case
>>>>>> it really should be fixed. Not sure which is a better issue as both
>>>>>> seem to have issues either at the GENERIC level or GIMPLE level with
>>>> shared
>>>>>> state.
>>>>>>
>>>>>> Let me know if this is better as it seems now that I really think
>>>> about 
>>>>>> it GIMPLE or GENERIC functions in passes.c are the main issue. 
>>>>>>
>>>>>> Sorry for the misunderstanding and hopefully one of functions listed
>>>> is better
>>>>>> for moving forward with my proposal,
>>>>>
>>>>> Sorry, but guessing at useful projects by skimming through GCC code
>>>>> at this point isn't the way to go forward - this new "idea" lacks
>>>>> both detail and understanding.  Please try to stick to one of the
>>>>> suggested projects or do more thorough research in case you want
>>>>> to work on a new project idea next year.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>
>>>> I was talking about cgraphunits.c and it seems that according to this:
>>>> Parallelize compilation using threads. GCC currently has an awful lot
>>>> of truly global state and even more per-pass global state which makes
>>>> this somewhat hard. The idea is to avoid issues with global state as
>>>> much as possible by partitioning the compilation pipeline in pieces
>>>> that share as little global state as possible and ensure each thread
>>>> only works in one of those partitions. The biggest roadblock will be
>>>> the not thread-safe memory allocator of GCC garbage collector. The goal
>>>> of this project is to have a compilation pipeline driven by a scheduler
>>>> assigning functions to be optimized to the partitions in the pipeline.
>>>> This project would be mentored by Richard Biener. Required skills
>>>> include: C/C++, ability to analyze big complex code base,
>>>> parallelization
>>>>
>>>> We are trying to create a rendering pipeline if I'm correct and it
>>>> seems that the GENERIC level needs finalize_compilation_unit
>>>> to be fixed like expand_all_functions at the GIMPLE. That's my point it
>>>> still is within that project. Here is what I wrote
>>>> as I figured out that it was shared state related to GENERIC passing to
>>>> GIMPLE which is a bottleneck or would be in the 
>>>> threaded pipeline.
>>>>
>>>> https://docs.google.com/document/d/1BKVeh62IpigsQYf_fJqkdu_js0EeGdKtXInkWZ-DtU0/edit
>>>
>>> The pre- post-IPA parts cannot be easily parallelized and that includes 
>>> GENERIC to GIMPLE translation. This is why the project should focus on the 
>>> much easier post-IPA and pre-RTL parts of the compilation pipeline since 
>>> there interaction between functions is minimal. 
>>>
>>> Richard. 
>>>
>>>>
>>>> Nick
>>>
>> Richard,
>> Thanks seems we finally got somewhere I rewrote it for that. Please just 
>> doubt check tomorrow before I submit later tomorrow
>> as I would just want to make sure it's OK with you. Here is the link:
>> https://docs.google.com/document/d/1BKVeh62IpigsQYf_fJqkdu_js0EeGdKtXInkWZ-DtU0/edit
>>
>> The only thing is if you want more detail in the project part as I just 
>> think it's
>> enough but let me know if not.
> 
> It seems the project part lacks any specifics and those are to be
> researched and discovered in the Mid may - Late May timeframe of
> your Roadmap?
Yes this is because after looking into ipa functions it would take
some time to really understand it not just at a surface level. It's
not the functions themselves but thinking about how the pipeline
would work in the POST IPA and pre RTL functions. 

> I do not understand what "Create optimized functions ... for avoiding
> sharing of state ..." is about - what kind of functions do you intend
> to write?  Often examples help.
> 
Fix functions at both the post IPA  and pre RTL levels for avoiding sharing of 
state and increasing 
throughput with multiple threads in the compiler is what I actually meant. 
There aren't any new 
functions it's optimizing the paths. So it was a small typo. Thanks for 
noticing it.

I will submit either later today or early tomorrow so if there's anything else 
let me know, 
Nick
> Richard.
> 
>> Thanks for your patience,
>> Nick
Re: GSOC Proposal

Reply via email to