[PATCH] D141215: [clang-repl] Introduce Value to capture expression results

Aaron Ballman via Phabricator via cfe-commits Wed, 03 May 2023 12:32:14 -0700

aaron.ballman added inline comments.


================
Comment at: clang/include/clang/Interpreter/Value.h:46
+
+#define REPL_BUILTIN_TYPES                                                     
\
+  X(bool, Bool)                                                                
\
----------------
junaire wrote:
> aaron.ballman wrote:
> > v.g.vassilev wrote:
> > > aaron.ballman wrote:
> > > > v.g.vassilev wrote:
> > > > > aaron.ballman wrote:
> > > > > > Is this expected to be a complete list of builtin types? e.g., 
> > > > > > should this have `char8_t` and `void` and `wchar_t`, etc? Should 
> > > > > > this be including `clang/include/clang/AST/BuiltinTypes.def` 
> > > > > > instead of manually maintaining the list?
> > > > > This is used for various things including storing the bits into a 
> > > > > big-endian agnostic way. For `void` we have a special case in the 
> > > > > Value and we cannot define a union of a `void` type. We can include 
> > > > > the others that you suggest. All the relevant ones are described in 
> > > > > `clang::BuiltinType::getName`.
> > > > > 
> > > > > We cannot use `BuiltinTypes.def` because we want to have a mapping 
> > > > > between the type as written in the language (eg, `bool`, `unsigned`, 
> > > > > etc) and its underlying type name. That mapping is not available in 
> > > > > `BuiltinTypes.def`. Ideally we should extend `BuiltinTypes.def` 
> > > > > somehow but I'd prefer outside of this patch. One of the challenges 
> > > > > is that some of the types depend on the language options (eg. `_Bool` 
> > > > > vs `bool`) and I am not sure this can be resolved by tablegen.
> > > > > 
> > > > > On a broader perspective, the `Value` class is responsible for two 
> > > > > things: (a) get a value from the interpreter to compiled code (see 
> > > > > test case); (b) set a value from compiled code to the interpreter. 
> > > > > The second case is not yet covered (I can open soon another patch). 
> > > > > The major use-case is calling at runtime functions and taking input 
> > > > > parameters from compiled code.
> > > > > 
> > > > > FWIW, we should probably move all of these entities in a separate 
> > > > > namespace. I'd suggest `caas` (compiler-as-a-service) and possibly 
> > > > > rename the `Value` to `InterpreterValue` since `Value` is very 
> > > > > generic and there are already a couple of classes with that name in 
> > > > > llvm and clang. 
> > > > > We can include the others that you suggest. All the relevant ones are 
> > > > > described in clang::BuiltinType::getName.
> > > > 
> > > > Okay, so the plan is to handle all the builtin types (`_BitInt`, 
> > > > `_Complex`, various floating point formats, etc)? Will that be part of 
> > > > this patch or in follow-up work? (My intuition is that we should 
> > > > consider it up front because some of the builtin types are interesting 
> > > > -- like `_BitInt` because it's parameterized, which makes it novel 
> > > > compared to the other types.)
> > > > 
> > > > > We cannot use BuiltinTypes.def because we want to have a mapping 
> > > > > between the type as written in the language (eg, bool, unsigned, etc) 
> > > > > and its underlying type name. That mapping is not available in 
> > > > > BuiltinTypes.def. Ideally we should extend BuiltinTypes.def somehow 
> > > > > but I'd prefer outside of this patch. One of the challenges is that 
> > > > > some of the types depend on the language options (eg. _Bool vs bool) 
> > > > > and I am not sure this can be resolved by tablegen.
> > > > 
> > > > Thanks for the explanation! BuiltinTypes.def works well enough for 
> > > > times when we want to use macros and include the file to generate 
> > > > switch cases and the likes, but you're right that it's not well-suited 
> > > > for this. One thing to consider is whether we should change 
> > > > `BuiltinTypes.def` to be `BuiltinTypes.td` instead and use tablegen to 
> > > > generate the macro/include dance form as well as other output (such as 
> > > > for your needs, that can then consider language options or more complex 
> > > > predicates).
> > > > 
> > > > > FWIW, we should probably move all of these entities in a separate 
> > > > > namespace. I'd suggest caas (compiler-as-a-service) and possibly 
> > > > > rename the Value to InterpreterValue since Value is very generic and 
> > > > > there are already a couple of classes with that name in llvm and 
> > > > > clang.
> > > > 
> > > > I'm not in love with the name `caas` because that's not really a common 
> > > > acronym or abbreviation (and it looks like a typo due to `aa`). 
> > > > However, we already have an `interp` namespace in Clang for one of the 
> > > > other interpreters (constant expression evaluation), so that's not 
> > > > available for use. How about `repl` though?
> > > > 
> > > > As for considering changing the name from `Value` because of how many 
> > > > other `Value` types we have already... that's both a reason to rename 
> > > > and reason not to rename. I think I'm fine leaving it as `Value` so 
> > > > long as it's in a novel namespace.
> > > > > We can include the others that you suggest. All the relevant ones are 
> > > > > described in clang::BuiltinType::getName.
> > > > 
> > > > Okay, so the plan is to handle all the builtin types (`_BitInt`, 
> > > > `_Complex`, various floating point formats, etc)? Will that be part of 
> > > > this patch or in follow-up work? (My intuition is that we should 
> > > > consider it up front because some of the builtin types are interesting 
> > > > -- like `_BitInt` because it's parameterized, which makes it novel 
> > > > compared to the other types.)
> > > 
> > > That's a good point. I think the current patch offers a new capability 
> > > and it is probably fine to not address all at once. My concern is that 
> > > @junaire has a month to work on these things and this patch is the first 
> > > out of two patches. The risk here is to drop the ball on that altogether 
> > > if we add more work as a requirement for that patch to go in.
> > > 
> > > 
> > > > 
> > > > > We cannot use BuiltinTypes.def because we want to have a mapping 
> > > > > between the type as written in the language (eg, bool, unsigned, etc) 
> > > > > and its underlying type name. That mapping is not available in 
> > > > > BuiltinTypes.def. Ideally we should extend BuiltinTypes.def somehow 
> > > > > but I'd prefer outside of this patch. One of the challenges is that 
> > > > > some of the types depend on the language options (eg. _Bool vs bool) 
> > > > > and I am not sure this can be resolved by tablegen.
> > > > 
> > > > Thanks for the explanation! BuiltinTypes.def works well enough for 
> > > > times when we want to use macros and include the file to generate 
> > > > switch cases and the likes, but you're right that it's not well-suited 
> > > > for this. One thing to consider is whether we should change 
> > > > `BuiltinTypes.def` to be `BuiltinTypes.td` instead and use tablegen to 
> > > > generate the macro/include dance form as well as other output (such as 
> > > > for your needs, that can then consider language options or more complex 
> > > > predicates).
> > > 
> > > I am totally with you here. I am not sure how to do this since as of now 
> > > some of the types and their form as written are decided with a flag (IIRC 
> > > the various `char` flavors).
> > > 
> > > > 
> > > > > FWIW, we should probably move all of these entities in a separate 
> > > > > namespace. I'd suggest caas (compiler-as-a-service) and possibly 
> > > > > rename the Value to InterpreterValue since Value is very generic and 
> > > > > there are already a couple of classes with that name in llvm and 
> > > > > clang.
> > > > 
> > > > I'm not in love with the name `caas` because that's not really a common 
> > > > acronym or abbreviation (and it looks like a typo due to `aa`). 
> > > > However, we already have an `interp` namespace in Clang for one of the 
> > > > other interpreters (constant expression evaluation), so that's not 
> > > > available for use. How about `repl` though?
> > > 
> > > The only problem I have with `repl` is misleading. `repl` usually means 
> > > to people something that you run at your prompt and can type some 
> > > expressions in and see some results. However, here we are building a 
> > > foundational primitive to allow embedding an interpreter as part of your 
> > > static program and be able to cross the compiler/interpreter boundary. 
> > > The compiler-as-a-service (caas) is probably the closest I could find in 
> > > CS research terminology to describe that. Would adding some solid bits of 
> > > documentation to the entire approach make you more comfortable with that 
> > > naming? 
> > > 
> > > FWIW, the namespace comment should not be a requirement for this 
> > > particular patch. I think it is totally fine to keep it as it is and do 
> > > the change outside of this patch to help the review. Moreover, we will 
> > > probably need to move some of the existing components already.
> > > 
> > > > 
> > > > As for considering changing the name from `Value` because of how many 
> > > > other `Value` types we have already... that's both a reason to rename 
> > > > and reason not to rename. I think I'm fine leaving it as `Value` so 
> > > > long as it's in a novel namespace.
> > > 
> > > Ok, if that's not confusing, then let's keep it the way it was. That way 
> > > it should be easier to adopt that downstream for sure!
> > > 
> > > 
> > >>>We can include the others that you suggest. All the relevant ones are 
> > >>>described in clang::BuiltinType::getName.
> > >> Okay, so the plan is to handle all the builtin types (_BitInt, _Complex, 
> > >> various floating point formats, etc)? Will that be part of this patch or 
> > >> in follow-up work? (My intuition is that we should consider it up front 
> > >> because some of the builtin types are interesting -- like _BitInt 
> > >> because it's parameterized, which makes it novel compared to the other 
> > >> types.)
> > > That's a good point. I think the current patch offers a new capability 
> > > and it is probably fine to not address all at once. My concern is that 
> > > @junaire has a month to work on these things and this patch is the first 
> > > out of two patches. The risk here is to drop the ball on that altogether 
> > > if we add more work as a requirement for that patch to go in.
> > 
> > Yeah, the current patch is incremental progress, so I think it's defensible 
> > to leave this to follow-up work. But I worry that follow-up work is going 
> > to be hampered by the design choices made here, and I mostly want to avoid 
> > a situation where this works inconsistently and that's "good enough" for an 
> > extended period of time. It feels a bit like working with any builtin type 
> > is part of the MVP. However, I don't insist because this is still useful 
> > forward progress for a lot of use cases.
> > 
> > > The only problem I have with repl is misleading. repl usually means to 
> > > people something that you run at your prompt and can type some 
> > > expressions in and see some results. However, here we are building a 
> > > foundational primitive to allow embedding an interpreter as part of your 
> > > static program and be able to cross the compiler/interpreter boundary. 
> > > The compiler-as-a-service (caas) is probably the closest I could find in 
> > > CS research terminology to describe that. Would adding some solid bits of 
> > > documentation to the entire approach make you more comfortable with that 
> > > naming?
> > 
> > On the one hand, if we have to document what the name means, the name isn't 
> > very meaningful. On the other hand, we use `ento` for the static analyzer 
> > (because of "entomologist", one who studies bugs...), so I don't think 
> > users routinely get hung up by not understanding the namespace identifier. 
> > If you think `caas` is the best name for this, then let's go with that 
> > unless someone has a better suggestion.
> > 
> > > FWIW, the namespace comment should not be a requirement for this 
> > > particular patch. I think it is totally fine to keep it as it is and do 
> > > the change outside of this patch to help the review. Moreover, we will 
> > > probably need to move some of the existing components already.
> > 
> > Agreed.
> Thanks for the comments from @aaron.ballman & @v.g.vassilev. So IIUC we'll 
> handle the builtin types in later patches and add a namespace (`caas`) for 
> the entire libclangInterpreter infrastructure.
That matches my understanding.


================
Comment at: clang/include/clang/Interpreter/Value.h:83
+  Value() = default;
+  Value(void /*Interpreter*/ *In, void /*QualType*/ *Ty);
+  Value(const Value &RHS);
----------------
junaire wrote:
> aaron.ballman wrote:
> > junaire wrote:
> > > aaron.ballman wrote:
> > > > v.g.vassilev wrote:
> > > > > junaire wrote:
> > > > > > aaron.ballman wrote:
> > > > > > > Why do these take `void *` instead of the expected type?
> > > > > > Yeah for the first parameter, we could just use `Interpreter*` but 
> > > > > > the second one is an opaque type so I think we should keep it?
> > > > > See my previous comments on performance. We cannot include anything 
> > > > > bulky in the header file.
> > > > I think I understand why the design is the way it is, but it still 
> > > > makes me uneasy. The constructor takes a pointer to some bucket of 
> > > > bytes... no size information, no type information, etc. Just "here's a 
> > > > random pointer". And then later, we hope the user calls `setKind()` in 
> > > > a way that makes sense.
> > > > 
> > > > We need it to be fast, but we also need it to be correct -- the type 
> > > > system is the best tool for helping with that.
> > > Not really... The user doesn't need to call `setKind()` explicitly to 
> > > construct a `Value`, the constructor will handle it automatically. See 
> > > `ConvertQualTypeToKind` in `Value.cpp`. So if the pointer is just some 
> > > garbage data, the constructor should fail before yielding out a valid 
> > > instance.
> > Yeah, that's a fair point, except nothing actually validates that the 
> > opaque pointer you are handed is actually valid for anything because it 
> > eventually just does a reinterpret_cast, so I don't think the constructor 
> > will fail.
> OK, let me try to answer why we're passing `QualType` in an opaque pointer 
> way. So the `Value` is *constructed* in `__clang_Interpreter_SetValue*` 
> variants, and these functions (let's call them runtime interfaces) are 
> synthesized when we want to do value printing (like we have a `x` without 
> semi). So it makes things pretty hard if we use a complete type (How can we 
> synthesize that type? That dramatically complicated the code) In addition, we 
> want `Value.h`, which is part of the runtime, as lightweight as possible, so 
> we can't use the complete type in its constructor, or we have to include the 
> corresponding header file.
Okay, I think I was getting hung up on your second point:

> In addition, we want Value.h, which is part of the runtime, as lightweight as 
> possible, so we can't use the complete type in its constructor, or we have to 
> include the corresponding header file.")

while forgetting your first point:

> So the Value is *constructed* in __clang_Interpreter_SetValue* variants, and 
> these functions (let's call them runtime interfaces) are synthesized when we 
> want to do value printing (like we have a x without semi). So it makes things 
> pretty hard if we use a complete type (How can we synthesize that type? That 
> dramatically complicated the code)

The first point makes sense to me and is pretty reasonable rationale for why we 
pass `void *`, thanks! The second point still isn't particularly compelling to 
me (more on that below).


================
Comment at: clang/include/clang/Interpreter/Value.h:160-162
+  // Interpreter, QualType are stored as void* to reduce dependencies.
+  void *Interp = nullptr;
+  void *OpaqueType = nullptr;
----------------
junaire wrote:
> aaron.ballman wrote:
> > v.g.vassilev wrote:
> > > aaron.ballman wrote:
> > > > v.g.vassilev wrote:
> > > > > aaron.ballman wrote:
> > > > > > Why don't forward declares suffice if we're storing the information 
> > > > > > by pointer?
> > > > > This is a performance-critical class. We literally measure the 
> > > > > instruction count for it. We practically cannot include anything in 
> > > > > this header file because the class needs to included as part of the 
> > > > > interpreter runtime. For example:
> > > > > 
> > > > > ```
> > > > > #include <clang/Interpreter/Value.h>
> > > > > Value ResultV;
> > > > > gInterpreter->evaluate("float i = 12.3; i++", &V);
> > > > > printf("Value is %d\n", ResultV.castAs<int>());
> > > > > ```
> > > > > 
> > > > > This is how you can do things in Cling. This not yet there but that's 
> > > > > our next step.
> > > > > 
> > > > > For performance reasons we have the `ValueKind` optimization which 
> > > > > allows us to perform most of the operations we need very fast. There 
> > > > > are some operations such as printing the concrete type which need the 
> > > > > actual `QualType` and so on but they are outside of the performance 
> > > > > critical paths and it is okay to resort back to the real types 
> > > > > providing the level of accuracy we need.
> > > > > 
> > > > That sounds like it's going to lead to maintenance problems in the long 
> > > > term, right? I can't think of another header file we say "don't touch 
> > > > this because it may impact runtime performance", and so I can easily 
> > > > imagine someone breaking your expectation that this file can't include 
> > > > anything else.
> > > > 
> > > > Is there a long-term plan for addressing this?
> > > We have a few components like the Lexer that are extremely prone to 
> > > performance regressions.
> > > 
> > > In terms for a longer-term plan in addressing this there are some steps 
> > > could help IMO. First, this component is relatively standalone and very 
> > > few changes will be required over time, for these I am hoping to be 
> > > listed as a reviewer. Second, we can add a comment in the include area, 
> > > making a note that including anything here will degrade the performance 
> > > of almost all interpreted code. Third, we will find out about this in our 
> > > downstream use-cases as the things get significantly slower.
> > > 
> > > Neither is silver bullet but that's probably the best we could do at that 
> > > time. Btw, we might be able to add a test that's part of LLVM's 
> > > performance analysis infrastructure.
> > > Neither is silver bullet but that's probably the best we could do at that 
> > > time. Btw, we might be able to add a test that's part of LLVM's 
> > > performance analysis infrastructure.
> > 
> > Yeah, we should probably consider doing that. But to make sure I understand 
> > the performance concerns... when we change functionality in the lexer, we 
> > (potentially) slow down the lexing phase of compilation. That's 
> > straightforward and unsurprising. But in this case, it sounds like the act 
> > of including another header file in this header file will cause a runtime 
> > performance concern, even if no other changes are made. If I'm correct, I 
> > can't think of anything else in the compiler that works like that.
> I believe what @v.g.vassilev means is that the repl itself might include 
> `Value.h` as a part of *runtime*, so if the header is heavy, you can notice 
> the repl is slowed down. (That's obvious) So keep in mind we're breaking the 
> boundary between the compiled code and interpreted code (It's kinda 
> confusing) here it actually impacts interpreted code.
> I believe what @v.g.vassilev means is that the repl itself might include 
> Value.h as a part of *runtime*, so if the header is heavy, you can notice the 
> repl is slowed down. (That's obvious) So keep in mind we're breaking the 
> boundary between the compiled code and interpreted code (It's kinda 
> confusing) here it actually impacts interpreted code.

I'm not certain that's a reasonable design choice to make. Or, stated somewhat 
differently, I'm really uncomfortable with having header files we can't 
maintain because changing them impacts runtime performance in surprising ways. 
That's not a sustainable design even if we think this header will be reasonably 
stable. We need *some* amount of abstraction here so that we can have a clean 
design for the REPL interpreter without NFC code changes impacting performance. 
Otherwise, people will be discouraged from adding comments to this file (those 
take time to lex, after all), or using long identifiers (longer identifiers 
take longer to lex than shorter ones), or including what is used instead of 
using `void *` (as being discussed here), and so on.

This is quite probably something you've already thought about plenty, but... 
could we add an abstraction layer so that the interpreter side of things has a 
"low-token-count" interface that dispatches through to the actual 
implementation?


================
Comment at: clang/lib/Interpreter/InterpreterUtils.cpp:19
+  const llvm::APInt Addr(8 * sizeof(void *), Val);
+  return IntegerLiteral::Create(C, Addr, C.getUIntPtrType(), SourceLocation());
+}
----------------
So now this interface takes a `uint64_t` but it creates a `uintptr_t` literal; 
should this be making a uint64_t literal instead?


================
Comment at: clang/lib/Interpreter/InterpreterUtils.cpp:19
+  const llvm::APInt Addr(8 * sizeof(void *), Ptr);
+  return IntegerLiteral::Create(C, Addr, C.getUIntPtrType(), SourceLocation());
+}
----------------
junaire wrote:
> aaron.ballman wrote:
> > This question applies more generally than just this function, but should we 
> > be requiring these interfaces to supply a `SourceLocation` rather than 
> > hard-coding no location information? That can make it easier to see what's 
> > going on when dumping the AST for debugging purposes, etc even if it's not 
> > necessary for diagnostics or other reasons. (I don't insist, just a 
> > speculative question about the interface.)
> Keep in mind these helpers are actually used for *synthesizing* AST nodes, so 
> there are no SourceLocations available here. If you dumped the AST, you 
> probably will see something like:
> ```
> CallExpr 0x621000093270 'void'
> |-ImplicitCastExpr 0x621000093250 'void (*)(void *, void *, void *, unsigned 
> long long)' <FunctionToPointerDecay>
> | `-DeclRefExpr 0x6210000931f8 'void (void *, void *, void *, unsigned long 
> long)' lvalue Function 0x62100008fd98 '__clang_Interpreter_SetValueNoAlloc' 
> 'void (void *, void *, void *, unsigned long long)'
> |-CStyleCastExpr 0x621000093060 'void *' <IntegralToPointer>
> | `-IntegerLiteral 0x621000093020 'unsigned long' 106515188942880
> |-CStyleCastExpr 0x6210000930d0 'void *' <IntegralToPointer>
> | `-IntegerLiteral 0x621000093090 'unsigned long' 106515188942912
> |-CStyleCastExpr 0x621000093140 'void *' <IntegralToPointer>
> | `-IntegerLiteral 0x621000093100 'unsigned long' 107820859101728
> `-CStyleCastExpr 0x6210000931c8 'unsigned long long' <NoOp>
>   `-ImplicitCastExpr 0x6210000931a8 'unsigned long long' <IntegralCast> 
> part_of_explicit_cast
>     `-ImplicitCastExpr 0x621000093188 'int' <LValueToRValue> 
> part_of_explicit_cast
>       `-DeclRefExpr 0x621000092ec0 'int' lvalue Var 0x621000092d50 'x' 'int'
> ```
> 
> Horrible, but that's all we can do :( 
Ah, that's true, this is all synthesized.


================
Comment at: clang/lib/Interpreter/InterpreterUtils.h:38
+namespace clang {
+IntegerLiteral *IntegerLiteralExpr(ASTContext &C, uintptr_t Ptr);
+
----------------
junaire wrote:
> aaron.ballman wrote:
> > What is `Ptr` pointing to?
> > 
> > Should this be renamed to `UIntPtrTLiteralExpr` with a comment that says it 
> > creates an integer literal whose type is `uinptr_t`?
> I think it should be `uint64_t`. Fixed.
I'm still seeing `IntegerLiteral *IntegerLiteralExpr(ASTContext &C, uintptr_t 
Ptr);` for the declaration (the definition is using `uint64_t` though).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141215/new/

https://reviews.llvm.org/D141215

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D141215: [clang-repl] Introduce Value to capture expression results

Reply via email to