[PATCH] D108464: [clang][CodeGen] Refactor CreateTempAlloca function nest. NFC.

John McCall via Phabricator via cfe-commits Mon, 23 Aug 2021 20:01:19 -0700

rjmccall added a comment.

In D108464#2960791 <https://reviews.llvm.org/D108464#2960791>, @tlively wrote:


> In D108464#2960623 <https://reviews.llvm.org/D108464#2960623>, @rjmccall 
> wrote:
>
>> + JF, who knows something about Web Assembly, or can at least drag in the 
>> right people
>>
>> In D108464#2959591 <https://reviews.llvm.org/D108464#2959591>, @wingo wrote:
>>
>>> In D108464#2957276 <https://reviews.llvm.org/D108464#2957276>, @wingo wrote:
>>>
>>>> Sooooo... besides the refactor, this is getting closer to where I'm going 
>>>> in https://lists.llvm.org/pipermail/cfe-dev/2021-July/068559.html, though 
>>>> still NFC.  I think you can see where I would replace 
>>>> `getASTAllocaAddressSpace` with `getAllocaAddressSpace(QualType Ty)`, and 
>>>> possibly (depending on the source language) avoid casting the resulting 
>>>> alloca to `LangAS::Default`.  WDYT, is this sort of thing OK?
>>>
>>> Taking this patch as perhaps a better generic discussion point, @rjmccall 
>>> graciously gave some general feedback on this approach (thank you!!!):
>>>
>>> In D108360#2957844 <https://reviews.llvm.org/D108360#2957844>, @rjmccall 
>>> wrote:
>>>
>>>> I'm not sure that I agree with your overall plan, though:
>>>>
>>>> - The WebAssembly operand stack is not a good match for an address space 
>>>> at the language level because it's not addressable at all.  If you can't 
>>>> meaningfully have a pointer into the address space, then you don't really 
>>>> need this in the type system; it's more like a declaration modifier at 
>>>> best.
>>>> - Allocating local variables on the operand stack ought to be a very 
>>>> straightforward analysis in the backend.  There's not much optimization 
>>>> value in trying to do it in the frontend, and it's going to be problematic 
>>>> for things like coroutine lowering.
>>>> - The security argument seems pretty weak, not because security isn't 
>>>> important but because this is not really an adequate basis for getting the 
>>>> tamper-proof guarantee you want.  For example, LLVM passes can and do 
>>>> introduce its own allocas and store scalars into them sometimes.  Really 
>>>> you need some sort of "tamper-proof" *type* which the compiler can make an 
>>>> end-to-end guarantee of non-tamper-ability for the values of, and while 
>>>> optimally this would be implemented by just keeping values on the operand 
>>>> stack, in the general case you will need to have some sort of strategy for 
>>>> keeping things in memory.
>>>
>>> Thanks for thinking about this!  Indeed I started out with the goal of not 
>>> going deep into clang and if it's possible to avoid going too deeply, that 
>>> would be better for everyone involved.  I am starting to think however that 
>>> it may be unavoidable for me at least.
>>>
>>> So, I am focusing on WebAssembly global and local variables; the 
>>> WebAssembly operand stack is an artifact of the IR-to-MC lowering and 
>>> AFAICS doesn't have any bearing on what clang does -- though perhaps I am 
>>> misunderstanding what you are getting at here.  The issue is not to 
>>> allocate locals on the operand stack, but rather to allocate them as part 
>>> of the "locals" of a WebAssembly function 
>>> <https://webassembly.github.io/spec/core/syntax/modules.html#functions>.  
>>> Cc @tlively on the WebAssembly side.
>>
>> By "operand stack" I mean the innate, unaddressable stack that the 
>> WebAssembly VM maintains in order to make functions reentrant.  I don't know 
>> what term the VM spec uses for it, but I believe "operand stack" is widely 
>> accepted terminology for the unaddressable stack when you've got this kind 
>> of dual-stack setup.  And yes, VM "locals" would go there.
>
> @wingo, are there cases where it is useful to declare variables as living in 
> WebAssembly locals and not in the VM stack? I'm having trouble coming up with 
> a case where leaving that up to the backend is not enough. We clearly need a 
> way to prevent values from being written to main memory (AS 0), but it's not 
> clear to me that we need a way to specifically allocate locals for them.

Right, I think this is one of the key questions here.  Right now this seems to 
be entirely type-directed: it's a special property of a couple of builtin types 
that you can't take the address of their objects and those objects can only 
live in specific places.  Having it be type-directed, but only to the point of 
saying that certain types *must* use certain address spaces, and then imposing 
all the other restrictions on those types as novel restrictions on those 
address spaces, feels like it's adding complexity to the language concept of 
address spaces without much benefit.

>>> The main motivator is the ability to have "reference type" 
>>> <https://webassembly.github.io/spec/core/syntax/types.html#reference-types> 
>>> (`externref`/`funcref`) locals and globals 
>>> <https://webassembly.github.io/spec/core/syntax/modules.html#globals> at 
>>> all.  Reference-typed values can't be stored to linear memory.  They have 
>>> no size and no byte representation -- they are opaque values from the host. 
>>>  However, WebAssembly locals and globals can define storage locations of 
>>> type `externref` or `funcref`.
>>
>> I see.  I think you need to think carefully about the best way to represent 
>> values of these types in LLVM IR, because it probably cannot just be "treat 
>> them as a normal value, emit code a certain way that we know how to lower, 
>> and hope nothing goes wrong".  It seems to me that you probably need a new 
>> IR type for it, since normal types aren't restricted from memory and tokens 
>> can't be used as parameters or return values.
>>
>> Hopefully, someone had a plan for this when they introduced that WebAssembly 
>> extension.
>
> Yes, we had a plan :) In WebAssembly, reference types are essentially opaque 
> pointers that cannot be dereferenced or stored into main memory. They can, 
> however, be stored in WebAssembly globals and tables, which are modeled as 
> LLVM global pointers and global arrays in other address spaces. At the IR 
> level, reference types are modeled as pointers into a non-integral AS that 
> themselves live in a non-integral AS. If the optimizer ever spills a local 
> reference-typed value to memory, we are able to discover and correct that in 
> the backend. I believe we are currently assuming that the optimizer will 
> never introduce a store of a reference-typed value into a global main memory 
> location, though.

Hmm.  This seems like it's abusing the inner address space to create a 
primitive opaque user-defined type in LLVM IR, but I don't have a compelling 
argument why you shouldn't do it this way.  I guess my only real objection to 
the overall approach is vague hand-wringing about this idea of having strong 
representational invariants and then just working around passes that break them 
in the backend.

>>> But, if we add a generic OpenCL-like address space attribute, that would 
>>> allow the user to declare some variables to be in alternate address spaces. 
>>>  Then we can apply the ACLE SVE semantic restrictions to these values also, 
>>> and add on an additional restriction preventing address-of.  That way users 
>>> get to make off-heap definitions, and if they misuse them, they get 
>>> comprehensible errors.  LLVM IR and WebAssembly lowering is ready for these 
>>> alternate-address-space allocations.
>>
>> Again, I'm not sure you're getting anything at all from the address space 
>> side of this.  The restrictions on these variables prevent any of the 
>> general address-space logic from applying.  In a language sense, it's more 
>> like a storage class than an address space.
>
> Using address spaces lets us model loads and stores of reference-typed values 
> from and to globals and tables. I don't think it makes sense to present these 
> concepts as "address spaces" to C/C++ users, but that's what we're using at 
> the IR level.

Yeah, at some point I'm willing to accept that this is your best option at the 
IR level, but I want to not jam this into the user-facing language unless it's 
really the right design approach.

John.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108464/new/

https://reviews.llvm.org/D108464

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D108464: [clang][CodeGen] Refactor CreateTempAlloca function nest. NFC.

Reply via email to