[PATCH] D96033: [clang-repl] Land initial infrastructure for incremental parsing

Vassil Vassilev via Phabricator via cfe-commits Tue, 23 Feb 2021 04:27:40 -0800

v.g.vassilev marked 3 inline comments as done.
v.g.vassilev added inline comments.



================
Comment at: clang/lib/CodeGen/CodeGenAction.cpp:908
 
+CodeGenerator *CodeGenAction::getCodeGenerator() const {
+  return BEConsumer->getCodeGenerator();
----------------
lhames wrote:
> v.g.vassilev wrote:
> > lhames wrote:
> > > v.g.vassilev wrote:
> > > > sgraenitz wrote:
> > > > > v.g.vassilev wrote:
> > > > > > @rjmccall, we were wondering if there is a better way to ask 
> > > > > > CodeGen to start a new module. The current approach seems to be 
> > > > > > drilling hole in a number of abstraction layers.
> > > > > > 
> > > > > > In the past we have touched that area a little in 
> > > > > > https://reviews.llvm.org/D34444 and the answer may be already there 
> > > > > > but I fail to connect the dots.
> > > > > > 
> > > > > > Recently, we thought about having a new FrontendAction callback for 
> > > > > > beginning a new phase when compiling incremental input. We need to 
> > > > > > keep track of the created objects (needed for error recovery) in 
> > > > > > our Transaction. We can have a map of `Transaction*` to 
> > > > > > `llvm::Module*` in CodeGen. The issue is that new JITs take 
> > > > > > ownership of the `llvm::Module*` which seems to make it impossible 
> > > > > > to support jitted code removal with that model (cc: @lhames, 
> > > > > > @rsmith).
> > > > > When compiling incrementally, doeas a "new phase" mean that all 
> > > > > subsequent code will go into a new module from then on? How will 
> > > > > dependencies to previous symbols be handled? Are all symbols external?
> > > > > 
> > > > > > The issue is that new JITs take ownership of the llvm::Module*
> > > > > 
> > > > > That's true, but you can still keep a raw pointer to it, which will 
> > > > > be valid at least as long as the module wasn't linked. Afterwards it 
> > > > > depends on the linker:
> > > > > * RuntimeDyld can return ownership of the object's memory range via 
> > > > > `NotifyEmittedFunction`
> > > > > * JITLink provides the `ReturnObjectBufferFunction` for the same 
> > > > > purpose
> > > > > 
> > > > > > seems to make it impossible to support jitted code removal with 
> > > > > > that model
> > > > > 
> > > > > Can you figure out what symbols are affected and remove these? A la: 
> > > > > https://github.com/llvm/llvm-project/blob/13f4448ae7db1a47/llvm/include/llvm/ExecutionEngine/Orc/Core.h#L1020
> > > > > 
> > > > > I think @anarazel has ported a client with code removal to OrcV2 
> > > > > successfully in the past. Maybe there's something we can learn from 
> > > > > it.
> > > > > When compiling incrementally, doeas a "new phase" mean that all 
> > > > > subsequent code will go into a new module from then on? How will 
> > > > > dependencies to previous symbols be handled? Are all symbols external?
> > > > 
> > > > There is some discussion on this here 
> > > > https://reviews.llvm.org/D34444#812418
> > > > 
> > > > I think the relevant bit is that 'we have just one ever growing TU 
> > > > [...] which we send to the RuntimeDyLD allowing only JIT to resolve 
> > > > symbols from it.  We aid the JIT when resolving symbols with internal 
> > > > linkage by changing all internal linkage to external (We haven't seen 
> > > > issues with that approach)'.
> > > > 
> > > > > 
> > > > > > The issue is that new JITs take ownership of the llvm::Module*
> > > > > 
> > > > > That's true, but you can still keep a raw pointer to it, which will 
> > > > > be valid at least as long as the module wasn't linked. 
> > > > 
> > > > That was my first implementation when I upgraded cling to llvm9 where 
> > > > the `shared_ptr`s went to `unique_ptr`s. This was quite problematic for 
> > > > many of the use cases we support as the JIT is somewhat unpredictable 
> > > > to the high-level API user. 
> > > > 
> > > > 
> > > > >Afterwards it depends on the linker:
> > > > > * RuntimeDyld can return ownership of the object's memory range via 
> > > > > `NotifyEmittedFunction`
> > > > > * JITLink provides the `ReturnObjectBufferFunction` for the same 
> > > > > purpose
> > > > > 
> > > > 
> > > > That's exactly what we ended up doing (I would like to thank Lang here 
> > > > who gave a similar advice).
> > > > 
> > > > > > seems to make it impossible to support jitted code removal with 
> > > > > > that model
> > > > > 
> > > > > Can you figure out what symbols are affected and remove these? A la: 
> > > > > https://github.com/llvm/llvm-project/blob/13f4448ae7db1a47/llvm/include/llvm/ExecutionEngine/Orc/Core.h#L1020
> > > > > 
> > > > > I think @anarazel has ported a client with code removal to OrcV2 
> > > > > successfully in the past. Maybe there's something we can learn from 
> > > > > it.
> > > > 
> > > > Indeed. That's not yet on my radar as seemed somewhat distant in time.
> > > > 
> > > > Recently, we thought about having a new FrontendAction callback for 
> > > > beginning a new phase when compiling incremental input. We need to keep 
> > > > track of the created objects (needed for error recovery) in our 
> > > > Transaction. We can have a map of Transaction* to llvm::Module* in 
> > > > CodeGen. The issue is that new JITs take ownership of the llvm::Module* 
> > > > which seems to make it impossible to support jitted code removal with 
> > > > that model (cc: @lhames, @rsmith).
> > > 
> > > In the new APIs, in order to enable removable code, you can associate 
> > > Modules with ResourceTrackers when they're added to the JIT. The 
> > > ResourceTrackers then allow for removal. Idiomatic usage looks like:
> > > 
> > >   auto Mod = /* create module */;
> > >   auto RT = JD.createResourceTracker();
> > >   J.addModule(RT, std::move(Mod));
> > >   //...
> > >   if (auto Err = RT.remove())
> > >     /* handle Err */;
> > > 
> > > > we have just one ever growing TU [...] which we send to RuntimeDyld...
> > > 
> > > So is a TU the same as an llvm::Module in this context? If so, how do you 
> > > reconcile that with the JIT taking ownership of modules? Are you just 
> > > copying the Module each time before adding it?
> > > 
> > > > We need to keep track of the created objects (needed for error 
> > > > recovery) in our Transaction.
> > > 
> > > Do you need the Module* for error recovery? Or just the Decls?
> > > > Recently, we thought about having a new FrontendAction callback for 
> > > > beginning a new phase when compiling incremental input. We need to keep 
> > > > track of the created objects (needed for error recovery) in our 
> > > > Transaction. We can have a map of Transaction* to llvm::Module* in 
> > > > CodeGen. The issue is that new JITs take ownership of the llvm::Module* 
> > > > which seems to make it impossible to support jitted code removal with 
> > > > that model (cc: @lhames, @rsmith).
> > > 
> > > In the new APIs, in order to enable removable code, you can associate 
> > > Modules with ResourceTrackers when they're added to the JIT. The 
> > > ResourceTrackers then allow for removal. Idiomatic usage looks like:
> > > 
> > >   auto Mod = /* create module */;
> > >   auto RT = JD.createResourceTracker();
> > >   J.addModule(RT, std::move(Mod));
> > >   //...
> > >   if (auto Err = RT.remove())
> > >     /* handle Err */;
> > 
> > Nice, thanks!
> > 
> > > 
> > > > we have just one ever growing TU [...] which we send to RuntimeDyld...
> > > 
> > > So is a TU the same as an llvm::Module in this context? If so, how do you 
> > > reconcile that with the JIT taking ownership of modules? Are you just 
> > > copying the Module each time before adding it?
> > 
> > Each incremental chunk with which the TU grows has a corresponding 
> > `llvm::Module`. Once clang's CodeGen is done for the particular module it 
> > transfers the ownership to the `Transaction` which, in turn, hands it to 
> > the JIT and once the JIT is done it retains the ownership again.
> > 
> > > 
> > > > We need to keep track of the created objects (needed for error 
> > > > recovery) in our Transaction.
> > > 
> > > Do you need the Module* for error recovery? Or just the Decls?
> > 
> > Yes, we need a `llvm::Module` that corresponds to the Decls as sometimes 
> > CodeGen will decide not to emit a Decl.
> > Each incremental chunk with which the TU grows has a corresponding 
> > llvm::Module. Once clang's CodeGen is done for the particular module it 
> > transfers the ownership to the Transaction which, in turn, hands it to the 
> > JIT and once the JIT is done it retains the ownership again.
> 
> > Yes, we need a llvm::Module that corresponds to the Decls as sometimes 
> > CodeGen will decide not to emit a Decl.
> 
> Can you elaborate on this? (Or point me to the relevant discussion / code?)
> 
> Does CodeGen aggregate code into the Module as you CodeGen each incremental 
> chunk? Or do you Link the previously CodeGen'd module into a new one?
> > Each incremental chunk with which the TU grows has a corresponding 
> > llvm::Module. Once clang's CodeGen is done for the particular module it 
> > transfers the ownership to the Transaction which, in turn, hands it to the 
> > JIT and once the JIT is done it retains the ownership again.
> 
> > Yes, we need a llvm::Module that corresponds to the Decls as sometimes 
> > CodeGen will decide not to emit a Decl.
> 
> Can you elaborate on this? (Or point me to the relevant discussion / code?)
> 
> Does CodeGen aggregate code into the Module as you CodeGen each incremental 
> chunk? Or do you Link the previously CodeGen'd module into a new one?

Cling's "code unloading" rolls back the states of the various objects without 
any checkpointing. Consider the two subsequent incremental inputs: `int f() { 
return 12; } ` and `int i = f();`; `undo 1`. 

When we ask CodeGen to generate code for the first input it will not as `f` is 
not being used. Transaction1 will contain the `FunctionDecl*` for `f` but the 
corresponding llvm::Module will be empty. Then when we get the second input 
line, the Transaction2 will contain the `VarDecl*` but the corresponding 
llvm::Module will contain both IR definitions -- of `f` and `i`.

Having the clang::Decl is useful because we can restore the previous state of 
the various internal frontend structures such as lookup tables. However, we 
cannot just drop the llvm::Module as it might contain deferred declarations 
which were emitted due to a use.

That's pretty much the rationale behind this and the design dates back to 
pre-MCJIT times. I am all for making this more robust but that's what we 
currently have. The "code  unloading" is mostly done in cling's 
[DeclUnloader](https://github.com/vgvassilev/cling/blob/856f8e92f82f9037b3dbb27ae7f94add2ed6121f/lib/Interpreter/DeclUnloader.cpp#L815).

There was some useful discussion about the model 
[here](https://reviews.llvm.org/D34444#812418) quite some time ago





CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96033/new/

https://reviews.llvm.org/D96033

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D96033: [clang-repl] Land initial infrastructure for incremental parsing

Reply via email to