[PATCH] D89372: [OpenCL] Remove unused extensions

2020-10-16 Thread Jan Vesely via Phabricator via cfe-commits
jvesely added a comment.

In D89372#2335027 , @Anastasia wrote:

> In D89372#2334728 , @jvesely wrote:
>
>> In D89372#2334225 , @Anastasia 
>> wrote:
>>
 
>>>
>>> Ok, so the pragma is supposed to control the pointer dereferencing which 
>>> seems like a valid case but however, it is not implemented now. Do we know 
>>> of any immediate plans from anyone to implement this? If not I would prefer 
>>> to remove the pragma for now? If someone decided to implement this 
>>> functionality later fully it is absolutely fine. Pragma will be very easy 
>>> to add. There is no need for everyone to pay the price for something that 
>>> can't be used at the moment.
>>
>> I though the current behaviour of 'you can use #pragma, but the dereferences 
>> are always legal' was intentional for backward compatibility.
>> I don't think there are programs that would set it to 'disabled' and expect 
>> compilation failure.
>
> The initial state of the pragma is disabled, so if disabling the extension 
> isn't supposed to do anything then I don't know why would anyone ever enable 
> it?
>
>> My concern is that legacy apps would set '#pragma enabled' before using 
>> char/short. such behavior would produce warning/error if with this change 
>> applied.
>
> Correct, it will compile with a warning but not fail to compile. I am willing 
> to introduce a -W option (if not present already ) to suppress that warning 
> if it helps with the clean up and backward compatibility. It might also open 
> up opportunities for a wider clean up  - removing pragma in extensions that 
> currently require pragma for no good reason.  I have written more details on 
> this in 
> https://github.com/KhronosGroup/OpenCL-Docs/pull/355#issuecomment-662679499

Adding a new option cannot be a solution as legacy applications were written 
before its introduction.
CL1.x applications need to continue to work without changes, anything else 
breaks backward compatibility.

 The same arguments also apply to `cles_khr_in64`. It's illegal to use 
 int64 types in embedded profile unless you first enable the extensions. 
 Rather than removing it, `cles_khr_2d_image_array_writes` should be added 
 to the extension list.
>>>
>>> I don't think clang currently support embedded profile. Adding extension 
>>> into the OpenCLOptions is pointless if it's not used. Do you plan to add 
>>> any functionality for it? Defining macros can be done easily outside of 
>>> clang source code or if it is preferable to do inside there is 
>>> `MacroBuilder::defineMacro`  available in the target hooks. Currently for 
>>> every extension added into `OpenCLExtensions.def` there is a macro, a 
>>> pragma and a field in `OpenCLOptions` added. This is often more than what 
>>> is necessary. Plus Khronos has very many extensions if we assume that all 
>>> of them are added in clang it will become scalability and maintanance 
>>> nightmare. Most of the extensions only add functions so if we provide a way 
>>> to add macros for those outside of clang code base it will keep the common 
>>> code clean and vendors can be more flexible in adding the extensions 
>>> without the need to modify upstream code if they need to. I see it as an 
>>> opportunity to improve common code and out of tree implementations too. It 
>>> just need a little bit of restructuring.
>>
>> My understanding is that both the macro and working #pragma directive is 
>> necessary.
>
> I don't believe this is the correct interpretation. If you look at the 
> extension spec s9.1 it says:
>
> `Every extension which affects the OpenCL language semantics, syntax or adds 
> built-in functions to the language must create a preprocessor #define that 
> matches the extension name string.  This #define would be available in the 
> language if and only if the extension is supported on a given implementation.`

This is only one part indicating the availability of extension. It's not 
describing what's necessary to observe extended behaviour. The specs state that 
the initial state of the compiler is as if `#pragma OPENCL EXTENSION all : 
disable` was issued.
This means that functions introduced by extensions cannot be present, and the 
symbol names are available for application use unless such symbols were 
previously reserved.

> It does not say that the pragma is needed, it only says that macro is needed. 
> That perfectly makes sense because the macro allows to check that the 
> extension is present to implement certain functionality conditionally.

This cited line only talks about extensions availability, not extension use. 
One example of newly introduced functions is in cl_khr_in64_base_atomics. The 
specs say:

  The optional extensions cl_khr_int64_base_atomics and 
cl_khr_int64_extended_atomics
  implement atomic operations on 64-bit signed and unsigned integers to 
locations in __global
  

[PATCH] D89372: [OpenCL] Remove unused extensions

2020-10-16 Thread Jan Vesely via Phabricator via cfe-commits
jvesely added a comment.

In D89372#2334225 , @Anastasia wrote:

>> 
>
> Ok, so the pragma is supposed to control the pointer dereferencing which 
> seems like a valid case but however, it is not implemented now. Do we know of 
> any immediate plans from anyone to implement this? If not I would prefer to 
> remove the pragma for now? If someone decided to implement this functionality 
> later fully it is absolutely fine. Pragma will be very easy to add. There is 
> no need for everyone to pay the price for something that can't be used at the 
> moment.

I though the current behaviour of 'you can use #pragma, but the dereferences 
are always legal' was intentional for backward compatibility.
I don't think there are programs that would set it to 'disabled' and expect 
compilation failure. My concern is that legacy apps would set '#pragma enabled' 
before using char/short. such behavior would produce warning/error if with this 
change applied.

>> The same arguments also apply to `cles_khr_in64`. It's illegal to use int64 
>> types in embedded profile unless you first enable the extensions. Rather 
>> than removing it, `cles_khr_2d_image_array_writes` should be added to the 
>> extension list.
>
> I don't think clang currently support embedded profile. Adding extension into 
> the OpenCLOptions is pointless if it's not used. Do you plan to add any 
> functionality for it? Defining macros can be done easily outside of clang 
> source code or if it is preferable to do inside there is 
> `MacroBuilder::defineMacro`  available in the target hooks. Currently for 
> every extension added into `OpenCLExtensions.def` there is a macro, a pragma 
> and a field in `OpenCLOptions` added. This is often more than what is 
> necessary. Plus Khronos has very many extensions if we assume that all of 
> them are added in clang it will become scalability and maintanance nightmare. 
> Most of the extensions only add functions so if we provide a way to add 
> macros for those outside of clang code base it will keep the common code 
> clean and vendors can be more flexible in adding the extensions without the 
> need to modify upstream code if they need to. I see it as an opportunity to 
> improve common code and out of tree implementations too. It just need a 
> little bit of restructuring.

My understanding is that both the macro and working #pragma directive is 
necessary. The configuration bit is only needed if clang changes behaviour, 
which is a separate question.
I'd also argue that new third party extensions need an API call to register new 
extensions in order to get a working pragma mechanism.

Even if an extension only adds access new functions, pragma should control if 
user functions with conflicting names are legal or not.

for example, a program can implement function `new_func`, which gets later 
added to an extension. since the program doesn't know about the new extension 
it doesn't use `#pragma new_extension:enable` and continues to work correctly.
If the new extension exposes `new_func` unconditionally, the program breaks, 
because it doesn't check for a macro that didn't exist at the time it was 
written.
more recent programs can use ifdef to either use `new_func` provided by the 
extension, or implement a custom version.

I didn't find much about embedded program behavior in full profile 
implementation in the specs.
It only says that "embedded profile is a subset" which imo implies that legal 
embedded profile programs should work correctly in a full profile 
implementation.
This implies that cles_* pragmas should continue to work even if the behavior 
is always supported.

>>> Are you suggesting to leave this out? However I don't see any evidence that 
>>> this require either macro or pragma. I feel this is in rather incomplete 
>>> state. So I don't feel we can accomodate for all of these.
>>
>> "incomplete specification" is imo a good reason to drop support for an 
>> extension. My argument is that explanation of legacy extensions should rely 
>> on the spec that introduced them, rather than the current 2.x/3.x which does 
>> an arguably poor job on backward compatibility.
>
> Ok, the idea is not to break backwards compatibility of course. For the 
> extensions that intended to modify language (but never did) if we look from 
> upstream clang user's perspective those extensions couldn't possibly be used 
> to do anything useful. It is of course possible that in some forks the 
> functionality has been completed but then they can easily update the 
> implementation to include the extension definition back. This is very small 
> change compared to the actual extension functionality.
>
> I am ok to leave the extensions that could hypotetically modify the language 
> out of this patch for now. Perhaps we could add a comment explaining that 
> they are unused and only left for backwards compatibility. In a long term we 
> need to find some ways to remove the legacy 

[PATCH] D89372: [OpenCL] Remove unused extensions

2020-10-15 Thread Jan Vesely via Phabricator via cfe-commits
jvesely added a comment.

In D89372#2332997 , @Anastasia wrote:

> In D89372#2332853 , @jvesely wrote:
>
>> `cl_khr_byte_addressable_stores` changes language semantics. without it, 
>> pointer dereferences of types smaller than 32 bits are illegal.
>
> Ok, does it mean that we are missing to diagnose this in the frontend? Btw I 
> do acknowledge that what you say makes sense but I don't think the 
> documentation support that:
> https://www.khronos.org/registry/OpenCL/sdk/2.2/docs/man/html/cl_khr_byte_addressable_store.html
>
> Am I just looking in the wrong place?

Since it's only an extension in ocl 1.0 that spec is a better place to look:

  9.9 Byte Addressable Stores 
  Section 6.8.m describes restrictions on built-in types char, uchar, char2, 
uchar2, short,
  and half. The OpenCL extension cl_khr_byte_addressable_store removes these
  restrictions.  An application that wants to be able to write to elements of a 
pointer (or struct) that
  are of type char, uchar, char2, uchar2, short, ushort and half will need to 
include
  the #pragma OPENCL EXTENSION cl_khr_byte_addressable_store :
  enable directive before any code that performs writes that may not be 
supported as per section
  6.8.m.
  
  6.8 Restrictions
  ...
  m. Built-in types that are less than 32-bits in size i.e. char, uchar, char2, 
uchar2,
  short, ushort, and half have the following restriction:
  Writes to a pointer (or arrays) of type char, uchar, char2, uchar2, short,
  ushort, and half or to elements of a struct that are of type char, uchar,
  char2, uchar2, short and ushort are not supported. Refer to section 9.9
  for additional information.



>> Even if all clang targets support this the macro should still be defined for 
>> backward compatibility.
>
> Ok, are you saying that all targets currently support this and we sould 
> always define it? In this case I would be more happy if we move them into the 
> internal header that we add implicitly anyway...

r600/r700/EG/NI targets have a bit wonky support byte-addressable stores 
(basically using 32bit atomic MSKOR), but I don't expect those targets to 
survive for long, and for now, they advertise support.
some out of tree backends might also benefit from the extension (like legup -- 
verilog backend), but I'm not sure if out of tree targets are worth 
considering, or if they indeed make use of this extension.
on a high-level, I could see a restriction to 32bit data paths be useful for 
FPGA targets.

moving the define it to header breaks OCL1.0 programs. Based on the specs 
above. those programs are required to include the following:

  #ifdef cl_khr_byte_addressable_store
  #pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable
  #endif

Before they can dereference pointers to char/char2/short/... types (including 
array indexing and struct members).
so the `pragma` part needs to work, and the language level checks need to work 
if the extension is not enabled.

The same arguments also apply to `cles_khr_in64`. It's illegal to use int64 
types in embedded profile unless you first enable the extensions. Rather than 
removing it, `cles_khr_2d_image_array_writes` should be added to the extension 
list.

clang can always decide that OCL1.0 programs (even when compiled in cl-1.x 
mode) and embedded profile is unsupported.
I have no clue if there are many OCL programs that rely on those modes out 
there.
However, removing support for them is a significantly bigger change than 
cleaning up a few language-irrelevant extensions (actually, I'm not sure if 
clang ever supported OCL embedded mode).

>> it would be useful if the commit message or the description of this revision 
>> included a justification for each removed extension why it doesn't impact 
>> language semantics with spec references.
>
> Yes, this is a good suggestion in principle and we should try our best. 
> However I am not sure it is feasible for all of those, in particular this 
> documentation doesn't contain anything:
> https://man.opencl.org/cl_khr_context_abort.html
>
> Are you suggesting to leave this out? However I don't see any evidence that 
> this require either macro or pragma. I feel this is in rather incomplete 
> state. So I don't feel we can accomodate for all of these.

"incomplete specification" is imo a good reason to drop support for an 
extension. My argument is that explanation of legacy extensions should rely on 
the spec that introduced them, rather than the current 2.x/3.x which does an 
arguably poor job on backward compatibility.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89372/new/

https://reviews.llvm.org/D89372

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D89372: [OpenCL] Remove unused extensions

2020-10-15 Thread Jan Vesely via Phabricator via cfe-commits
jvesely added a comment.

`cl_khr_byte_addressable_stores` changes language semantics. without it, 
pointer dereferences of types smaller than 32 bits are illegal.
Even if all clang targets support this the macro should still be defined for 
backward compatibility.

it would be useful if the commit message or the description of this revision 
included a justification for each removed extension why it doesn't impact 
language semantics with spec references.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89372/new/

https://reviews.llvm.org/D89372

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78979: OpenCL: Include builtin header by default

2020-04-29 Thread Jan Vesely via Phabricator via cfe-commits
jvesely added a comment.

In D78979#2006901 , @yaxunl wrote:

> In D78979#2006847 , @arsenm wrote:
>
> > I'm also wondering if using -nogpulib for the corresponding linker purpose 
> > was correct, since in the OpenCL case it's not really an offload target. 
> > Maybe this should switch to -nostdlib?
>
>
> -nogpulib is fine since opencl compiler is in parallel with the device 
> compiler of CUDA/HIP. The library it uses is the device library.


OpenCL can target other devices than GPUs, including CPUs and FPGAs, referring 
to gpulibs wrt opencl is a misnomer.

It would be nice to have some clarity as to how OpenCL is handled wrt clang 
frontend vs. clang driver.
OpenCL options are currently split between the two (e.g. cl-denorms-are-zero is 
only available in the driver and not the frontend)
There are 3 implementations of CL headers, two in clang which might or might 
not be included by default, and the 3rd one in libclc.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78979/new/

https://reviews.llvm.org/D78979



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77923: OpenCL: Fix some missing predefined macros

2020-04-16 Thread Jan Vesely via Phabricator via cfe-commits
jvesely added a comment.

In D77923#1983787 , @yaxunl wrote:

> LGTM. The objective of the change is to simplify offline compilation. We want 
> to avoid asking users to add -D if a proper macro can be inferred from triple 
> and cpu. Ideally, users only need to specify triple and cpu with clang 
> driver. In case users want to override the default macros for the triple, 
> they can always override them by -D or -U in command line.


will this trigger extra warnings for the user? 
as long as there's a way to set custom values I'm fine.
I just don't understand why this can' be appended to the command line by the 
offline compiler distributed with the platform (and based on release if needed).
The platform will need to get the same number + any additional information it 
wants to report in `clGetDeviceInfo(CL_DEVICE_VERSION)` anyway so having it in 
one place would be prudent.

> The question is whether the default value of the macro is proper. To me this 
> change maintained the old behavior for spir and other targets and the default 
> value seems fine for amdgcn.




CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77923/new/

https://reviews.llvm.org/D77923



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77923: OpenCL: Fix some missing predefined macros

2020-04-14 Thread Jan Vesely via Phabricator via cfe-commits
jvesely added a comment.

In D77923#1981398 , @arsenm wrote:

> In D77923#1979286 , @jvesely wrote:
>
> > > In D77923#1976497 , @jvesely 
> > > wrote:
> > > 
> > >> OPENCL_VERSION is something that should be really set by the opencl 
> > >> driver rather than the compiler.
> > >>  coarse-grained SVM can be done without FEATURE_FLAT_ADDRESS_SPACE, so 
> > >> those gpus can get over 1.2, while the compiler can be used in an 
> > >> implementation that doesn't go above 1.2 even for gfx700+
> > > 
> > > 
> > > I don't know why this would imply we can't just have a static 
> > > `__OPENCL_VERSION__` for each device in Clang? I don't see anything in 
> > > the standard that says you are not allowed to have (going with your 
> > > example) `__OPENCL_VERSION__=200` and `__OPENCL_C_VERSION=120` at the 
> > > same time.
> > > 
> > > It seems like we should be able to just have `__OPENCL_VERSION__` defined 
> > > in Clang, and let the runtime control setting `__OPENCL_C_VERSION__` via 
> > > `-cl-std=`. If the current patch sets `__OPENCL_VERSION__` differently 
> > > than the runtime does then it seems like we should make it match, but I'm 
> > > not sure how best to confirm that.
> >
> > The `__OPENCL_VERSION__` macro in the kernel needs to match the version 
> > returned by `clGetDeviceInfo` (both refer to OpenCL version supported by 
> > the device),
> >  which in turn must be <= version returned by `clGetPlatformInfo`.
> >  This part makes it unsuitable for hardcoding in the compiler.
> >
> > `__OPENCL_VERSION__` should indicate device capabilities rather than 
> > language features. specs talk about available resources.
> >  For example, `CL_DEVICE_IMAGE2D_MAX_HEIGHT` is required to be >= 8192 for 
> > OpenCL 1.2 while it's required to be >=16384 for OpenCL 2.1.
>
>
> But why would the driver not report the device values which we do know in the 
> compiler? A discrepancy between these sounds like a driver bug.


because it needs to be capped by the platform opencl version. devices 
supporting opencl x.y need to support opencl > if there's to be a default value hardcoded in the compiler it'd be nice if 
>> platform drivers had an easy way to override (lower) it.
> 
> I found a hacky way to do this. You can -include a header, which happens 
> after the predefines, with undef and redef of the predefined value. This does 
> trigger -Wreserved-id-macro (which I only found through -Weverything), but 
> you should be able to pragma push/pop the warning off inside the special 
> header.

does passing `-U__OPENCL_VERSION__ -D __OPENCL_VERSION__=x.y` on the commadline 
work as well? the header trick was my fallback path.

> For now I could assume the platform is 2.0 based on -amdhsa, and not for 
> -mesa3d to work around clover not currently supporting CL2.0.




CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77923/new/

https://reviews.llvm.org/D77923



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77923: OpenCL: Fix some missing predefined macros

2020-04-13 Thread Jan Vesely via Phabricator via cfe-commits
jvesely added a comment.

> In D77923#1976497 , @jvesely wrote:
> 
>> OPENCL_VERSION is something that should be really set by the opencl driver 
>> rather than the compiler.
>>  coarse-grained SVM can be done without FEATURE_FLAT_ADDRESS_SPACE, so those 
>> gpus can get over 1.2, while the compiler can be used in an implementation 
>> that doesn't go above 1.2 even for gfx700+
> 
> 
> I don't know why this would imply we can't just have a static 
> `__OPENCL_VERSION__` for each device in Clang? I don't see anything in the 
> standard that says you are not allowed to have (going with your example) 
> `__OPENCL_VERSION__=200` and `__OPENCL_C_VERSION=120` at the same time.
> 
> It seems like we should be able to just have `__OPENCL_VERSION__` defined in 
> Clang, and let the runtime control setting `__OPENCL_C_VERSION__` via 
> `-cl-std=`. If the current patch sets `__OPENCL_VERSION__` differently than 
> the runtime does then it seems like we should make it match, but I'm not sure 
> how best to confirm that.

The `__OPENCL_VERSION__` macro in the kernel needs to match the version 
returned by `clGetDeviceInfo` (both refer to OpenCL version supported by the 
device),
which in turn must be <= version returned by `clGetPlatformInfo`.
This part makes it unsuitable for hardcoding in the compiler.

`__OPENCL_VERSION__` should indicate device capabilities rather than language 
features. specs talk about available resources.
For example, `CL_DEVICE_IMAGE2D_MAX_HEIGHT` is required to be >= 8192 for 
OpenCL 1.2 while it's required to be >=16384 for OpenCL 2.1.

if there's to be a default value hardcoded in the compiler it'd be nice if 
platform drivers had an easy way to override (lower) it.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77923/new/

https://reviews.llvm.org/D77923



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77923: OpenCL: Fix some missing predefined macros

2020-04-11 Thread Jan Vesely via Phabricator via cfe-commits
jvesely added a comment.

OPENCL_VERSION sounds like something that should be really set by the opencl 
driver rather than the compiler.
coarse-grained SVM can be done without FEATURE_FLAT_ADDRESS_SPACE, so those 
gpus can get over 1.2, while the compiler can be used in an implementation that 
doesn't go above 1.2 even for gfx700+


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77923/new/

https://reviews.llvm.org/D77923



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D66068: cmake: Make building clang-shlib optional

2019-08-19 Thread Jan Vesely via Phabricator via cfe-commits
jvesely added a comment.

Hi,

sorry for the delay. I fully understand the need to reduce the number of 
options. Having always used SHARED_LIBS build I remember weekly shared build 
breakages.
That said, forcing everyone to build one huge library effectively makes debug 
builds unusable in any practical way.
Having a usable debug build would also obsolete the with_asserts option.
What is the use-case of one big shared lib that split libraries can't do?


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66068/new/

https://reviews.llvm.org/D66068



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D66068: cmake: Make building clang-shlib optional

2019-08-13 Thread Jan Vesely via Phabricator via cfe-commits
jvesely added a comment.

In D66068#1627706 , @beanz wrote:

> I want to dissect this a bit.
>
> In D66068#1627451 , @E5ten wrote:
>
> > I am in favour of adding a user-facing option to disable generating this 
> > duplicate library for users that don't need it
>
>
> Why do you call this duplicate? It is unique. There is no other library in 
> the clang build that serves the role of this library.


It duplicates functionality provided by separate/component libraries.
Why can't this be an option the same way I can pick when building llvm?

>> there should be an option to disable linking a library that takes a long 
>> time to link and isn't necessary for a lot of users.
> 
> I think this is nuanced. When you say "takes a long time to link", I'm 
> curious why you say that. For me it takes 45s to link on my laptop in a Linux 
> VM using LLD in a build configuration that also includes all our backends 
> (which is kinda a worst-case scenario), and 10s to link if I only include 
> X86. That doesn't seem like a super long time, and it doesn't rely on any 
> billion-dollar compute farms.

Is this a debug build?

> While it does slow down full-build times (slightly), I think the benefit is 
> less broken bots which benefits the community as a whole.

do you have any numbers to support that claim?

> Going back to @jvesely's original email, I'm not sure why this adds minutes 
> to your build time. I'd be curious if there are other low-hanging fruit that 
> would improve your productivity without the community cost of adding new 
> build configurations that disable building and testing things that we 
> actually ship.

My first guess would be the difference between debug and release build 
(presence of debug info). the size of the library is 1.6GB on my system:

  $ du -h /usr/local/llvm-git/lib/libclang-cpp.so.10svn 
  1.6G  /usr/local/llvm-git/lib/libclang-cpp.so.10svn

just reading the inputs and writing the output library will take 30s on a 
100MB/s hdd (my laptop is slower than that) and it hasn't done any linking yet.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66068/new/

https://reviews.llvm.org/D66068



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D66068: cmake: Make building clang-shlib optional

2019-08-12 Thread Jan Vesely via Phabricator via cfe-commits
jvesely added a comment.

In D66068#1625995 , @beanz wrote:

> I think you and I disagree here. General developer workflows don't need to 
> include building `all` for every minor change. In my normal workflow I just 
> re-run `check-llvm` or `check-clang` over and over again, only building the 
> `all` target before I post a patch. With that workflow I only build the 
> library once per-patch to ensure that it builds. Which is exactly the goal of 
> not having it be included.


That's your workflow. I need to run 'make install' because the modifications 
are used by an external project. If there's an option to skip shlib during 
'make install' I'd be happy to use it.

> If you *really* *really* can't be bothered to ensure that the things we ship 
> actually build with your change you can always use the (undocumented, and 
> hidden) `CLANG_TOOL_CLANG_SHLIB_BUILD=Off` option. I really don't see a 
> reason to add a user-facing option that we don't want people using.

I'm pretty sure more people would appreciate not installing large duplicate 
libraries, but if a hidden option will do, I'll update the patch.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66068/new/

https://reviews.llvm.org/D66068



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D66068: cmake: Make building clang-shlib optional

2019-08-12 Thread Jan Vesely via Phabricator via cfe-commits
jvesely added a comment.

In D66068#1625478 , @beanz wrote:

> I generally am not a fan of adding more and more options. As long as you're 
> not linking the library it won't be generated by any of the check targets. 
> With the llvm dylib we've had many issues over the years where changes to 
> LLVM break building the dylib and many developers don't build it, so we have 
> to wait for a bot to catch it. Making the clang dylib build as part of `all` 
> in every possible build configuration is a recognition that this is something 
> we ship and we should ensure works.


I've no problem with it staying ON by default. I just don't see a reason to 
make every minor modification rebuild last 5-6mins instead of 30s, for 
something I don't want, need or use.
Not everyone is a billion-dollar corporation with a dedicated build farm.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66068/new/

https://reviews.llvm.org/D66068



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D66068: cmake: Make building clang-shlib optional

2019-08-11 Thread Jan Vesely via Phabricator via cfe-commits
jvesely created this revision.
jvesely added a reviewer: beanz.
Herald added a subscriber: mgorny.
Herald added a project: clang.

It takes ~5min to link, add an option to avoid that.


Repository:
  rC Clang

https://reviews.llvm.org/D66068

Files:
  CMakeLists.txt
  tools/CMakeLists.txt


Index: tools/CMakeLists.txt
===
--- tools/CMakeLists.txt
+++ tools/CMakeLists.txt
@@ -14,7 +14,7 @@
 
 add_clang_subdirectory(clang-rename)
 add_clang_subdirectory(clang-refactor)
-if(UNIX)
+if(UNIX AND CLANG_ENABLE_SHLIB)
   add_clang_subdirectory(clang-shlib)
 endif()
 
Index: CMakeLists.txt
===
--- CMakeLists.txt
+++ CMakeLists.txt
@@ -451,6 +451,10 @@
 option(CLANG_BUILD_TOOLS
   "Build the Clang tools. If OFF, just generate build targets." ON)
 
+if (UNIX)
+  option(CLANG_ENABLE_SHLIB "Build combined clang dynamic library." ON)
+endif()
+
 option(CLANG_ENABLE_ARCMT "Build ARCMT." ON)
 option(CLANG_ENABLE_STATIC_ANALYZER "Build static analyzer." ON)
 


Index: tools/CMakeLists.txt
===
--- tools/CMakeLists.txt
+++ tools/CMakeLists.txt
@@ -14,7 +14,7 @@
 
 add_clang_subdirectory(clang-rename)
 add_clang_subdirectory(clang-refactor)
-if(UNIX)
+if(UNIX AND CLANG_ENABLE_SHLIB)
   add_clang_subdirectory(clang-shlib)
 endif()
 
Index: CMakeLists.txt
===
--- CMakeLists.txt
+++ CMakeLists.txt
@@ -451,6 +451,10 @@
 option(CLANG_BUILD_TOOLS
   "Build the Clang tools. If OFF, just generate build targets." ON)
 
+if (UNIX)
+  option(CLANG_ENABLE_SHLIB "Build combined clang dynamic library." ON)
+endif()
+
 option(CLANG_ENABLE_ARCMT "Build ARCMT." ON)
 option(CLANG_ENABLE_STATIC_ANALYZER "Build static analyzer." ON)
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49650: Targets/AMDGPU: Don't set fp32-denormals feature for r600

2018-07-27 Thread Jan Vesely via Phabricator via cfe-commits
jvesely abandoned this revision.
jvesely added a comment.

In https://reviews.llvm.org/D49650#1177323, @jvesely wrote:

> In https://reviews.llvm.org/D49650#1176336, @arsenm wrote:
>
> > In https://reviews.llvm.org/D49650#1175461, @jvesely wrote:
> >
> > > In https://reviews.llvm.org/D49650#1175438, @arsenm wrote:
> > >
> > > > According to cayman manual, these registers do exist so we should 
> > > > probably just make the feature accepted on r600 as well
> > >
> > >
> > > sure, that's the way it was before r335942. I assumed the removal was 
> > > intentional.
> >
> >
> > Probably accidental because nothing in r600 was actually using it
>
>
> given the number of warnings it outputs, I find that unlikely.
>  @tstellar what was your intention? It's not like someone is going to work on 
> EG/CM denormals any time soon.
>
> I don't mind either way. I just want to avoid another round of bikeshedding.


https://reviews.llvm.org/D49934


Repository:
  rC Clang

https://reviews.llvm.org/D49650



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49650: Targets/AMDGPU: Don't set fp32-denormals feature for r600

2018-07-26 Thread Jan Vesely via Phabricator via cfe-commits
jvesely added a comment.

In https://reviews.llvm.org/D49650#1176336, @arsenm wrote:

> In https://reviews.llvm.org/D49650#1175461, @jvesely wrote:
>
> > In https://reviews.llvm.org/D49650#1175438, @arsenm wrote:
> >
> > > According to cayman manual, these registers do exist so we should 
> > > probably just make the feature accepted on r600 as well
> >
> >
> > sure, that's the way it was before r335942. I assumed the removal was 
> > intentional.
>
>
> Probably accidental because nothing in r600 was actually using it


given the number of warnings it outputs, I find that unlikely.
@tstellar what was your intention? It's not like someone is going to work on 
EG/CM denormals any time soon.

I don't mind either way. I just want to avoid another round of bikeshedding.


Repository:
  rC Clang

https://reviews.llvm.org/D49650



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49650: Targets/AMDGPU: Don't set fp32-denormals feature for r600

2018-07-25 Thread Jan Vesely via Phabricator via cfe-commits
jvesely added a comment.

In https://reviews.llvm.org/D49650#1175438, @arsenm wrote:

> According to cayman manual, these registers do exist so we should probably 
> just make the feature accepted on r600 as well


sure, that's the way it was before r335942. I assumed the removal was 
intentional.


Repository:
  rC Clang

https://reviews.llvm.org/D49650



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49650: Targets/AMDGPU: Don't set fp32-denormals feature for r600

2018-07-22 Thread Jan Vesely via Phabricator via cfe-commits
jvesely created this revision.
jvesely added reviewers: arsenm, tstellar.
Herald added subscribers: t-tye, tpr, dstuttard, yaxunl, nhaehnle, wdng, 
kzhuravl.

This feature was removed in r335942


Repository:
  rC Clang

https://reviews.llvm.org/D49650

Files:
  lib/Basic/Targets/AMDGPU.cpp


Index: lib/Basic/Targets/AMDGPU.cpp
===
--- lib/Basic/Targets/AMDGPU.cpp
+++ lib/Basic/Targets/AMDGPU.cpp
@@ -205,7 +205,7 @@
 if (I == "+fp64-fp16-denormals" || I == "-fp64-fp16-denormals")
   hasFP64Denormals = true;
   }
-  if (!hasFP32Denormals)
+  if (!hasFP32Denormals && isAMDGCN(getTriple()))
 TargetOpts.Features.push_back(
 (Twine(CGOptsGPU.HasFastFMAF && !CGOpts.FlushDenorm
? '+'


Index: lib/Basic/Targets/AMDGPU.cpp
===
--- lib/Basic/Targets/AMDGPU.cpp
+++ lib/Basic/Targets/AMDGPU.cpp
@@ -205,7 +205,7 @@
 if (I == "+fp64-fp16-denormals" || I == "-fp64-fp16-denormals")
   hasFP64Denormals = true;
   }
-  if (!hasFP32Denormals)
+  if (!hasFP32Denormals && isAMDGCN(getTriple()))
 TargetOpts.Features.push_back(
 (Twine(CGOptsGPU.HasFastFMAF && !CGOpts.FlushDenorm
? '+'
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38667: AMDGPU: Parse r600 CPU name early and expose FMAF capability

2017-10-19 Thread Jan Vesely via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL316181: AMDGPU: Parse r600 CPU name early and expose FMAF 
capability (authored by jvesely).

Changed prior to commit:
  https://reviews.llvm.org/D38667?vs=118136=119614#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D38667

Files:
  cfe/trunk/lib/Basic/Targets/AMDGPU.cpp
  cfe/trunk/test/Preprocessor/predefined-arch-macros.c


Index: cfe/trunk/test/Preprocessor/predefined-arch-macros.c
===
--- cfe/trunk/test/Preprocessor/predefined-arch-macros.c
+++ cfe/trunk/test/Preprocessor/predefined-arch-macros.c
@@ -2378,10 +2378,20 @@
 // RUN: -target amdgcn-unknown-unknown \
 // RUN:   | FileCheck -match-full-lines %s -check-prefix=CHECK_AMDGCN
 // CHECK_AMDGCN: #define __AMDGCN__ 1
+// CHECK_AMDGCN: #define __HAS_FMAF__ 1
+// CHECK_AMDGCN: #define __HAS_FP64__ 1
+// CHECK_AMDGCN: #define __HAS_LDEXPF__ 1
 
 // Begin r600 tests 
 //
 // RUN: %clang -march=amdgcn -E -dM %s -o - 2>&1 \
 // RUN: -target r600-unknown-unknown \
 // RUN:   | FileCheck -match-full-lines %s -check-prefix=CHECK_R600
 // CHECK_R600: #define __R600__ 1
+// CHECK_R600-NOT: #define __HAS_FMAF__ 1
+
+// RUN: %clang -march=amdgcn -mcpu=cypress -E -dM %s -o - 2>&1 \
+// RUN: -target r600-unknown-unknown \
+// RUN:   | FileCheck -match-full-lines %s -check-prefix=CHECK_R600_FP64
+// CHECK_R600_FP64-DAG: #define __R600__ 1
+// CHECK_R600_FP64-DAG: #define __HAS_FMAF__ 1
Index: cfe/trunk/lib/Basic/Targets/AMDGPU.cpp
===
--- cfe/trunk/lib/Basic/Targets/AMDGPU.cpp
+++ cfe/trunk/lib/Basic/Targets/AMDGPU.cpp
@@ -308,14 +308,20 @@
 
 AMDGPUTargetInfo::AMDGPUTargetInfo(const llvm::Triple ,
const TargetOptions )
-: TargetInfo(Triple), GPU(isAMDGCN(Triple) ? GK_GFX6 : GK_R600),
+: TargetInfo(Triple),
+  GPU(isAMDGCN(Triple) ? GK_GFX6 : parseR600Name(Opts.CPU)),
   hasFP64(false), hasFMAF(false), hasLDEXPF(false),
   AS(isGenericZero(Triple)) {
   if (getTriple().getArch() == llvm::Triple::amdgcn) {
 hasFP64 = true;
 hasFMAF = true;
 hasLDEXPF = true;
   }
+  if (getTriple().getArch() == llvm::Triple::r600) {
+if (GPU == GK_EVERGREEN_DOUBLE_OPS || GPU == GK_CAYMAN) {
+  hasFMAF = true;
+}
+  }
   auto IsGenericZero = isGenericZero(Triple);
   resetDataLayout(getTriple().getArch() == llvm::Triple::amdgcn
   ? (IsGenericZero ? DataLayoutStringSIGenericIsZero


Index: cfe/trunk/test/Preprocessor/predefined-arch-macros.c
===
--- cfe/trunk/test/Preprocessor/predefined-arch-macros.c
+++ cfe/trunk/test/Preprocessor/predefined-arch-macros.c
@@ -2378,10 +2378,20 @@
 // RUN: -target amdgcn-unknown-unknown \
 // RUN:   | FileCheck -match-full-lines %s -check-prefix=CHECK_AMDGCN
 // CHECK_AMDGCN: #define __AMDGCN__ 1
+// CHECK_AMDGCN: #define __HAS_FMAF__ 1
+// CHECK_AMDGCN: #define __HAS_FP64__ 1
+// CHECK_AMDGCN: #define __HAS_LDEXPF__ 1
 
 // Begin r600 tests 
 //
 // RUN: %clang -march=amdgcn -E -dM %s -o - 2>&1 \
 // RUN: -target r600-unknown-unknown \
 // RUN:   | FileCheck -match-full-lines %s -check-prefix=CHECK_R600
 // CHECK_R600: #define __R600__ 1
+// CHECK_R600-NOT: #define __HAS_FMAF__ 1
+
+// RUN: %clang -march=amdgcn -mcpu=cypress -E -dM %s -o - 2>&1 \
+// RUN: -target r600-unknown-unknown \
+// RUN:   | FileCheck -match-full-lines %s -check-prefix=CHECK_R600_FP64
+// CHECK_R600_FP64-DAG: #define __R600__ 1
+// CHECK_R600_FP64-DAG: #define __HAS_FMAF__ 1
Index: cfe/trunk/lib/Basic/Targets/AMDGPU.cpp
===
--- cfe/trunk/lib/Basic/Targets/AMDGPU.cpp
+++ cfe/trunk/lib/Basic/Targets/AMDGPU.cpp
@@ -308,14 +308,20 @@
 
 AMDGPUTargetInfo::AMDGPUTargetInfo(const llvm::Triple ,
const TargetOptions )
-: TargetInfo(Triple), GPU(isAMDGCN(Triple) ? GK_GFX6 : GK_R600),
+: TargetInfo(Triple),
+  GPU(isAMDGCN(Triple) ? GK_GFX6 : parseR600Name(Opts.CPU)),
   hasFP64(false), hasFMAF(false), hasLDEXPF(false),
   AS(isGenericZero(Triple)) {
   if (getTriple().getArch() == llvm::Triple::amdgcn) {
 hasFP64 = true;
 hasFMAF = true;
 hasLDEXPF = true;
   }
+  if (getTriple().getArch() == llvm::Triple::r600) {
+if (GPU == GK_EVERGREEN_DOUBLE_OPS || GPU == GK_CAYMAN) {
+  hasFMAF = true;
+}
+  }
   auto IsGenericZero = isGenericZero(Triple);
   resetDataLayout(getTriple().getArch() == llvm::Triple::amdgcn
   ? (IsGenericZero ? DataLayoutStringSIGenericIsZero
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D37231: Add half load and store builtins

2017-09-07 Thread Jan Vesely via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL312742: [OpenCL] Add half load and store builtins (authored 
by jvesely).

Changed prior to commit:
  https://reviews.llvm.org/D37231?vs=113624=114240#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D37231

Files:
  cfe/trunk/include/clang/Basic/Builtins.def
  cfe/trunk/include/clang/Basic/Builtins.h
  cfe/trunk/lib/Basic/Builtins.cpp
  cfe/trunk/lib/CodeGen/CGBuiltin.cpp
  cfe/trunk/test/CodeGenOpenCL/no-half.cl

Index: cfe/trunk/lib/CodeGen/CGBuiltin.cpp
===
--- cfe/trunk/lib/CodeGen/CGBuiltin.cpp
+++ cfe/trunk/lib/CodeGen/CGBuiltin.cpp
@@ -2768,6 +2768,24 @@
 Name),
 {NDRange, Block}));
   }
+
+  case Builtin::BI__builtin_store_half:
+  case Builtin::BI__builtin_store_halff: {
+Value *Val = EmitScalarExpr(E->getArg(0));
+Address Address = EmitPointerWithAlignment(E->getArg(1));
+Value *HalfVal = Builder.CreateFPTrunc(Val, Builder.getHalfTy());
+return RValue::get(Builder.CreateStore(HalfVal, Address));
+  }
+  case Builtin::BI__builtin_load_half: {
+Address Address = EmitPointerWithAlignment(E->getArg(0));
+Value *HalfVal = Builder.CreateLoad(Address);
+return RValue::get(Builder.CreateFPExt(HalfVal, Builder.getDoubleTy()));
+  }
+  case Builtin::BI__builtin_load_halff: {
+Address Address = EmitPointerWithAlignment(E->getArg(0));
+Value *HalfVal = Builder.CreateLoad(Address);
+return RValue::get(Builder.CreateFPExt(HalfVal, Builder.getFloatTy()));
+  }
   case Builtin::BIprintf:
 if (getTarget().getTriple().isNVPTX())
   return EmitNVPTXDevicePrintfCallExpr(E, ReturnValue);
Index: cfe/trunk/lib/Basic/Builtins.cpp
===
--- cfe/trunk/lib/Basic/Builtins.cpp
+++ cfe/trunk/lib/Basic/Builtins.cpp
@@ -69,9 +69,14 @@
   bool MSModeUnsupported =
   !LangOpts.MicrosoftExt && (BuiltinInfo.Langs & MS_LANG);
   bool ObjCUnsupported = !LangOpts.ObjC1 && BuiltinInfo.Langs == OBJC_LANG;
-  bool OclCUnsupported = LangOpts.OpenCLVersion != 200 &&
- BuiltinInfo.Langs == OCLC20_LANG;
+  bool OclC1Unsupported = (LangOpts.OpenCLVersion / 100) != 1 &&
+  (BuiltinInfo.Langs & ALL_OCLC_LANGUAGES ) ==  OCLC1X_LANG;
+  bool OclC2Unsupported = LangOpts.OpenCLVersion != 200 &&
+  (BuiltinInfo.Langs & ALL_OCLC_LANGUAGES) == OCLC20_LANG;
+  bool OclCUnsupported = !LangOpts.OpenCL &&
+ (BuiltinInfo.Langs & ALL_OCLC_LANGUAGES);
   return !BuiltinsUnsupported && !MathBuiltinsUnsupported && !OclCUnsupported &&
+ !OclC1Unsupported && !OclC2Unsupported &&
  !GnuModeUnsupported && !MSModeUnsupported && !ObjCUnsupported;
 }
 
Index: cfe/trunk/include/clang/Basic/Builtins.h
===
--- cfe/trunk/include/clang/Basic/Builtins.h
+++ cfe/trunk/include/clang/Basic/Builtins.h
@@ -36,10 +36,12 @@
   CXX_LANG = 0x4, // builtin for cplusplus only.
   OBJC_LANG = 0x8,// builtin for objective-c and objective-c++
   MS_LANG = 0x10, // builtin requires MS mode.
-  OCLC20_LANG = 0x20, // builtin for OpenCL C only.
+  OCLC20_LANG = 0x20, // builtin for OpenCL C 2.0 only.
+  OCLC1X_LANG = 0x40, // builtin for OpenCL C 1.x only.
   ALL_LANGUAGES = C_LANG | CXX_LANG | OBJC_LANG, // builtin for all languages.
   ALL_GNU_LANGUAGES = ALL_LANGUAGES | GNU_LANG,  // builtin requires GNU mode.
-  ALL_MS_LANGUAGES = ALL_LANGUAGES | MS_LANG // builtin requires MS mode.
+  ALL_MS_LANGUAGES = ALL_LANGUAGES | MS_LANG,// builtin requires MS mode.
+  ALL_OCLC_LANGUAGES = OCLC1X_LANG | OCLC20_LANG // builtin for OCLC languages.
 };
 
 namespace Builtin {
Index: cfe/trunk/include/clang/Basic/Builtins.def
===
--- cfe/trunk/include/clang/Basic/Builtins.def
+++ cfe/trunk/include/clang/Basic/Builtins.def
@@ -1424,6 +1424,12 @@
 LANGBUILTIN(to_local, "v*v*", "tn", OCLC20_LANG)
 LANGBUILTIN(to_private, "v*v*", "tn", OCLC20_LANG)
 
+// OpenCL half load/store builtin
+LANGBUILTIN(__builtin_store_half, "vdh*", "n", ALL_OCLC_LANGUAGES)
+LANGBUILTIN(__builtin_store_halff, "vfh*", "n", ALL_OCLC_LANGUAGES)
+LANGBUILTIN(__builtin_load_half, "dhC*", "nc", ALL_OCLC_LANGUAGES)
+LANGBUILTIN(__builtin_load_halff, "fhC*", "nc", ALL_OCLC_LANGUAGES)
+
 // Builtins for os_log/os_trace
 BUILTIN(__builtin_os_log_format_buffer_size, "zcC*.", "p:0:nut")
 BUILTIN(__builtin_os_log_format, "v*v*cC*.", "p:0:nt")
Index: cfe/trunk/test/CodeGenOpenCL/no-half.cl
===
--- cfe/trunk/test/CodeGenOpenCL/no-half.cl
+++ cfe/trunk/test/CodeGenOpenCL/no-half.cl
@@ -0,0 +1,39 @@
+// RUN: %clang_cc1 %s -cl-std=cl2.0 -emit-llvm -o - -triple spir-unknown-unknown | FileCheck %s
+// RUN: %clang_cc1 %s -cl-std=cl1.2 

[PATCH] D37231: Add half load and store builtins

2017-09-06 Thread Jan Vesely via Phabricator via cfe-commits
jvesely marked 2 inline comments as done.
jvesely added inline comments.



Comment at: test/CodeGenOpenCL/no-half.cl:27
+   foo[0] = __builtin_load_halff(bar);
+// CHECK: [[HALF_VAL:%.*]] = load
+// CHECK: [[FULL_VAL:%.*]] = fpext half [[HALF_VAL]] to float

Anastasia wrote:
> jvesely wrote:
> > Anastasia wrote:
> > > Minor thing: any reason you are not checking the load fully?
> > just my laziness, I've added full check.
> Could we do the same for the above examples too?
I don't understand.
if you mean test_store_*, those functions do not generate any load 
instructions. the full generated code is:
test_store_double:
```
entry:
  %0 = fptrunc double %foo to half
  store half %0, half addrspace(1)* %bar, align 2
  ret void
```


Repository:
  rL LLVM

https://reviews.llvm.org/D37231



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D37231: Add half load and store builtins

2017-09-04 Thread Jan Vesely via Phabricator via cfe-commits
jvesely requested review of this revision.
jvesely added a comment.

please let me know if your accept still stands for the modified version.


Repository:
  rL LLVM

https://reviews.llvm.org/D37231



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D37231: Add half load and store builtins

2017-09-01 Thread Jan Vesely via Phabricator via cfe-commits
jvesely updated this revision to Diff 113624.
jvesely added a comment.

mark load pointers const


Repository:
  rL LLVM

https://reviews.llvm.org/D37231

Files:
  include/clang/Basic/Builtins.def
  include/clang/Basic/Builtins.h
  lib/Basic/Builtins.cpp
  lib/CodeGen/CGBuiltin.cpp
  test/CodeGenOpenCL/no-half.cl

Index: test/CodeGenOpenCL/no-half.cl
===
--- /dev/null
+++ test/CodeGenOpenCL/no-half.cl
@@ -0,0 +1,39 @@
+// RUN: %clang_cc1 %s -cl-std=cl2.0 -emit-llvm -o - -triple spir-unknown-unknown | FileCheck %s
+// RUN: %clang_cc1 %s -cl-std=cl1.2 -emit-llvm -o - -triple spir-unknown-unknown | FileCheck %s
+// RUN: %clang_cc1 %s -cl-std=cl1.1 -emit-llvm -o - -triple spir-unknown-unknown | FileCheck %s
+
+#pragma OPENCL EXTENSION cl_khr_fp64:enable
+
+// CHECK-LABEL: @test_store_float(float %foo, half addrspace({{.}}){{.*}} %bar)
+__kernel void test_store_float(float foo, __global half* bar)
+{
+	__builtin_store_halff(foo, bar);
+// CHECK: [[HALF_VAL:%.*]] = fptrunc float %foo to half
+// CHECK: store half [[HALF_VAL]], half addrspace({{.}})* %bar, align 2
+}
+
+// CHECK-LABEL: @test_store_double(double %foo, half addrspace({{.}}){{.*}} %bar)
+__kernel void test_store_double(double foo, __global half* bar)
+{
+	__builtin_store_half(foo, bar);
+// CHECK: [[HALF_VAL:%.*]] = fptrunc double %foo to half
+// CHECK: store half [[HALF_VAL]], half addrspace({{.}})* %bar, align 2
+}
+
+// CHECK-LABEL: @test_load_float(float addrspace({{.}}){{.*}} %foo, half addrspace({{.}}){{.*}} %bar)
+__kernel void test_load_float(__global float* foo, __global half* bar)
+{
+	foo[0] = __builtin_load_halff(bar);
+// CHECK: [[HALF_VAL:%.*]] = load half, half addrspace({{.}})* %bar
+// CHECK: [[FULL_VAL:%.*]] = fpext half [[HALF_VAL]] to float
+// CHECK: store float [[FULL_VAL]], float addrspace({{.}})* %foo
+}
+
+// CHECK-LABEL: @test_load_double(double addrspace({{.}}){{.*}} %foo, half addrspace({{.}}){{.*}} %bar)
+__kernel void test_load_double(__global double* foo, __global half* bar)
+{
+	foo[0] = __builtin_load_half(bar);
+// CHECK: [[HALF_VAL:%.*]] = load half, half addrspace({{.}})* %bar
+// CHECK: [[FULL_VAL:%.*]] = fpext half [[HALF_VAL]] to double
+// CHECK: store double [[FULL_VAL]], double addrspace({{.}})* %foo
+}
Index: lib/CodeGen/CGBuiltin.cpp
===
--- lib/CodeGen/CGBuiltin.cpp
+++ lib/CodeGen/CGBuiltin.cpp
@@ -2724,6 +2724,24 @@
 Name),
 {NDRange, Block}));
   }
+
+  case Builtin::BI__builtin_store_half:
+  case Builtin::BI__builtin_store_halff: {
+Value *Val = EmitScalarExpr(E->getArg(0));
+Address Address = EmitPointerWithAlignment(E->getArg(1));
+Value *HalfVal = Builder.CreateFPTrunc(Val, Builder.getHalfTy());
+return RValue::get(Builder.CreateStore(HalfVal, Address));
+  }
+  case Builtin::BI__builtin_load_half: {
+Address Address = EmitPointerWithAlignment(E->getArg(0));
+Value *HalfVal = Builder.CreateLoad(Address);
+return RValue::get(Builder.CreateFPExt(HalfVal, Builder.getDoubleTy()));
+  }
+  case Builtin::BI__builtin_load_halff: {
+Address Address = EmitPointerWithAlignment(E->getArg(0));
+Value *HalfVal = Builder.CreateLoad(Address);
+return RValue::get(Builder.CreateFPExt(HalfVal, Builder.getFloatTy()));
+  }
   case Builtin::BIprintf:
 if (getTarget().getTriple().isNVPTX())
   return EmitNVPTXDevicePrintfCallExpr(E, ReturnValue);
Index: lib/Basic/Builtins.cpp
===
--- lib/Basic/Builtins.cpp
+++ lib/Basic/Builtins.cpp
@@ -69,9 +69,14 @@
   bool MSModeUnsupported =
   !LangOpts.MicrosoftExt && (BuiltinInfo.Langs & MS_LANG);
   bool ObjCUnsupported = !LangOpts.ObjC1 && BuiltinInfo.Langs == OBJC_LANG;
-  bool OclCUnsupported = LangOpts.OpenCLVersion != 200 &&
- BuiltinInfo.Langs == OCLC20_LANG;
+  bool OclC1Unsupported = (LangOpts.OpenCLVersion / 100) != 1 &&
+  (BuiltinInfo.Langs & ALL_OCLC_LANGUAGES ) ==  OCLC1X_LANG;
+  bool OclC2Unsupported = LangOpts.OpenCLVersion != 200 &&
+  (BuiltinInfo.Langs & ALL_OCLC_LANGUAGES) == OCLC20_LANG;
+  bool OclCUnsupported = !LangOpts.OpenCL &&
+ (BuiltinInfo.Langs & ALL_OCLC_LANGUAGES);
   return !BuiltinsUnsupported && !MathBuiltinsUnsupported && !OclCUnsupported &&
+ !OclC1Unsupported && !OclC2Unsupported &&
  !GnuModeUnsupported && !MSModeUnsupported && !ObjCUnsupported;
 }
 
Index: include/clang/Basic/Builtins.h
===
--- include/clang/Basic/Builtins.h
+++ include/clang/Basic/Builtins.h
@@ -36,10 +36,12 @@
   CXX_LANG = 0x4, // builtin for cplusplus only.
   OBJC_LANG = 0x8,// builtin for objective-c and objective-c++
   MS_LANG = 0x10, // builtin requires MS mode.
-  OCLC20_LANG = 0x20, // builtin for OpenCL C only.
+  

[PATCH] D37231: Add half load and store builtins

2017-09-01 Thread Jan Vesely via Phabricator via cfe-commits
jvesely added inline comments.



Comment at: include/clang/Basic/Builtins.def:1427
+// OpenCL half load/store builtin
+BUILTIN(__builtin_store_half, "vdh*", "n")
+BUILTIN(__builtin_store_halff, "vfh*", "n")

Anastasia wrote:
> jvesely wrote:
> > Anastasia wrote:
> > > I think this should be a language builtin (see above) but perhaps we 
> > > might need to extend the language version here. Because I believe we only 
> > > have OpenCL v2.0 currently.
> > > 
> > > Also this should only be available if `cl_khr_fp16` is supported and 
> > > enabled? I think we are doing similar with some subgroups functions (e.g. 
> > > `get_kernel_sub_group_count_for_ndrange`) that are only supported by 
> > > `cl_khr_subgroup` but those have custom diagnostic though. May be we 
> > > could leave this check out since `half` is not available if `cl_khr_fp16` 
> > > is not enabled anyways.
> > This is specifically meant to be used when `cl_khr_fp16` is **not** 
> > available.
> > CLC allows using half as storage format and  half pointers without the 
> > extension,
> > vstore_half/vload_half are used to load/store half values. (CL1.2 CH 
> > 6.1.1.1)
> > 
> > These builtins are not necessary if `cl_khr_fp16` is available (we can use 
> > regular loads/stores).
> > 
> > I'll take stab at making these CLC only, but similarly to device specific 
> > builtins it looked useful beyond that, since these builtins provide access 
> > to half type storage.
> Strange. This is not how I would interpret from the extension spec though: 
> https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/cl_khr_fp16.html
> 
> But I think for this change is probably fine indeed because this doesn't 
> affect half type itself.
I'm not sure I see the conflict here. `cl_khr_fp16` adds support for `half` 
scalar and `halfn` vector types.
without the extension the specs say (`OCL 1.2 Ch. 6.1.1.1`):
> The half data type can only be used to declare a pointer to a buffer that 
> contains half values.
`vload_half` and `vstore_half` used to access those buffers without needing 
`half` type (or the `cl_khr_fp16` extension).

> But I think for this change is probably fine indeed because this doesn't 
> affect half type itself.
exactly. this is needed outside of `cl_khr_fp16`, or the `half` type





Comment at: test/CodeGenOpenCL/no-half.cl:19
+   __builtin_store_half(foo, bar);
+// CHECK: [[HALF_VAL:%.*]] = fptrunc double %foo to half
+// CHECK: store half [[HALF_VAL]], half addrspace({{.}})* %bar, align 2

Anastasia wrote:
> Would it make sense to add a check for `load` similarly to `store` in the 
> test_load_float/test_load_double tests?
there is no load. `fptrunc double %foo to half` uses the function parameter 
directly



Comment at: test/CodeGenOpenCL/no-half.cl:27
+   foo[0] = __builtin_load_halff(bar);
+// CHECK: [[HALF_VAL:%.*]] = load
+// CHECK: [[FULL_VAL:%.*]] = fpext half [[HALF_VAL]] to float

Anastasia wrote:
> Minor thing: any reason you are not checking the load fully?
just my laziness, I've added full check.


Repository:
  rL LLVM

https://reviews.llvm.org/D37231



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D37231: Add half load and store builtins

2017-09-01 Thread Jan Vesely via Phabricator via cfe-commits
jvesely updated this revision to Diff 113588.
jvesely marked 6 inline comments as done.
jvesely edited the summary of this revision.
jvesely added a comment.

fully check loads in tests


Repository:
  rL LLVM

https://reviews.llvm.org/D37231

Files:
  include/clang/Basic/Builtins.def
  include/clang/Basic/Builtins.h
  lib/Basic/Builtins.cpp
  lib/CodeGen/CGBuiltin.cpp
  test/CodeGenOpenCL/no-half.cl

Index: test/CodeGenOpenCL/no-half.cl
===
--- /dev/null
+++ test/CodeGenOpenCL/no-half.cl
@@ -0,0 +1,39 @@
+// RUN: %clang_cc1 %s -cl-std=cl2.0 -emit-llvm -o - -triple spir-unknown-unknown | FileCheck %s
+// RUN: %clang_cc1 %s -cl-std=cl1.2 -emit-llvm -o - -triple spir-unknown-unknown | FileCheck %s
+// RUN: %clang_cc1 %s -cl-std=cl1.1 -emit-llvm -o - -triple spir-unknown-unknown | FileCheck %s
+
+#pragma OPENCL EXTENSION cl_khr_fp64:enable
+
+// CHECK-LABEL: @test_store_float(float %foo, half addrspace({{.}}){{.*}} %bar)
+__kernel void test_store_float(float foo, __global half* bar)
+{
+	__builtin_store_halff(foo, bar);
+// CHECK: [[HALF_VAL:%.*]] = fptrunc float %foo to half
+// CHECK: store half [[HALF_VAL]], half addrspace({{.}})* %bar, align 2
+}
+
+// CHECK-LABEL: @test_store_double(double %foo, half addrspace({{.}}){{.*}} %bar)
+__kernel void test_store_double(double foo, __global half* bar)
+{
+	__builtin_store_half(foo, bar);
+// CHECK: [[HALF_VAL:%.*]] = fptrunc double %foo to half
+// CHECK: store half [[HALF_VAL]], half addrspace({{.}})* %bar, align 2
+}
+
+// CHECK-LABEL: @test_load_float(float addrspace({{.}}){{.*}} %foo, half addrspace({{.}}){{.*}} %bar)
+__kernel void test_load_float(__global float* foo, __global half* bar)
+{
+	foo[0] = __builtin_load_halff(bar);
+// CHECK: [[HALF_VAL:%.*]] = load half, half addrspace({{.}})* %bar
+// CHECK: [[FULL_VAL:%.*]] = fpext half [[HALF_VAL]] to float
+// CHECK: store float [[FULL_VAL]], float addrspace({{.}})* %foo
+}
+
+// CHECK-LABEL: @test_load_double(double addrspace({{.}}){{.*}} %foo, half addrspace({{.}}){{.*}} %bar)
+__kernel void test_load_double(__global double* foo, __global half* bar)
+{
+	foo[0] = __builtin_load_half(bar);
+// CHECK: [[HALF_VAL:%.*]] = load half, half addrspace({{.}})* %bar
+// CHECK: [[FULL_VAL:%.*]] = fpext half [[HALF_VAL]] to double
+// CHECK: store double [[FULL_VAL]], double addrspace({{.}})* %foo
+}
Index: lib/CodeGen/CGBuiltin.cpp
===
--- lib/CodeGen/CGBuiltin.cpp
+++ lib/CodeGen/CGBuiltin.cpp
@@ -2724,6 +2724,24 @@
 Name),
 {NDRange, Block}));
   }
+
+  case Builtin::BI__builtin_store_half:
+  case Builtin::BI__builtin_store_halff: {
+Value *Val = EmitScalarExpr(E->getArg(0));
+Address Address = EmitPointerWithAlignment(E->getArg(1));
+Value *HalfVal = Builder.CreateFPTrunc(Val, Builder.getHalfTy());
+return RValue::get(Builder.CreateStore(HalfVal, Address));
+  }
+  case Builtin::BI__builtin_load_half: {
+Address Address = EmitPointerWithAlignment(E->getArg(0));
+Value *HalfVal = Builder.CreateLoad(Address);
+return RValue::get(Builder.CreateFPExt(HalfVal, Builder.getDoubleTy()));
+  }
+  case Builtin::BI__builtin_load_halff: {
+Address Address = EmitPointerWithAlignment(E->getArg(0));
+Value *HalfVal = Builder.CreateLoad(Address);
+return RValue::get(Builder.CreateFPExt(HalfVal, Builder.getFloatTy()));
+  }
   case Builtin::BIprintf:
 if (getTarget().getTriple().isNVPTX())
   return EmitNVPTXDevicePrintfCallExpr(E, ReturnValue);
Index: lib/Basic/Builtins.cpp
===
--- lib/Basic/Builtins.cpp
+++ lib/Basic/Builtins.cpp
@@ -69,9 +69,14 @@
   bool MSModeUnsupported =
   !LangOpts.MicrosoftExt && (BuiltinInfo.Langs & MS_LANG);
   bool ObjCUnsupported = !LangOpts.ObjC1 && BuiltinInfo.Langs == OBJC_LANG;
-  bool OclCUnsupported = LangOpts.OpenCLVersion != 200 &&
- BuiltinInfo.Langs == OCLC20_LANG;
+  bool OclC1Unsupported = (LangOpts.OpenCLVersion / 100) != 1 &&
+  (BuiltinInfo.Langs & ALL_OCLC_LANGUAGES ) ==  OCLC1X_LANG;
+  bool OclC2Unsupported = LangOpts.OpenCLVersion != 200 &&
+  (BuiltinInfo.Langs & ALL_OCLC_LANGUAGES) == OCLC20_LANG;
+  bool OclCUnsupported = !LangOpts.OpenCL &&
+ (BuiltinInfo.Langs & ALL_OCLC_LANGUAGES);
   return !BuiltinsUnsupported && !MathBuiltinsUnsupported && !OclCUnsupported &&
+ !OclC1Unsupported && !OclC2Unsupported &&
  !GnuModeUnsupported && !MSModeUnsupported && !ObjCUnsupported;
 }
 
Index: include/clang/Basic/Builtins.h
===
--- include/clang/Basic/Builtins.h
+++ include/clang/Basic/Builtins.h
@@ -36,10 +36,12 @@
   CXX_LANG = 0x4, // builtin for cplusplus only.
   OBJC_LANG = 0x8,// builtin for objective-c and objective-c++
   MS_LANG = 0x10, 

[PATCH] D37231: Add half load and store builtins

2017-08-29 Thread Jan Vesely via Phabricator via cfe-commits
jvesely updated this revision to Diff 113190.
jvesely added a comment.

restrict builtins to OCLC langauges


Repository:
  rL LLVM

https://reviews.llvm.org/D37231

Files:
  include/clang/Basic/Builtins.def
  include/clang/Basic/Builtins.h
  lib/Basic/Builtins.cpp
  lib/CodeGen/CGBuiltin.cpp
  test/CodeGenOpenCL/no-half.cl

Index: test/CodeGenOpenCL/no-half.cl
===
--- /dev/null
+++ test/CodeGenOpenCL/no-half.cl
@@ -0,0 +1,39 @@
+// RUN: %clang_cc1 %s -cl-std=cl2.0 -emit-llvm -o - -triple spir-unknown-unknown | FileCheck %s
+// RUN: %clang_cc1 %s -cl-std=cl1.2 -emit-llvm -o - -triple spir-unknown-unknown | FileCheck %s
+// RUN: %clang_cc1 %s -cl-std=cl1.1 -emit-llvm -o - -triple spir-unknown-unknown | FileCheck %s
+
+#pragma OPENCL EXTENSION cl_khr_fp64:enable
+
+// CHECK-LABEL: @test_store_float(float %foo, half addrspace({{.}}){{.*}} %bar)
+__kernel void test_store_float(float foo, __global half* bar)
+{
+	__builtin_store_halff(foo, bar);
+// CHECK: [[HALF_VAL:%.*]] = fptrunc float %foo to half
+// CHECK: store half [[HALF_VAL]], half addrspace({{.}})* %bar, align 2
+}
+
+// CHECK-LABEL: @test_store_double(double %foo, half addrspace({{.}}){{.*}} %bar)
+__kernel void test_store_double(double foo, __global half* bar)
+{
+	__builtin_store_half(foo, bar);
+// CHECK: [[HALF_VAL:%.*]] = fptrunc double %foo to half
+// CHECK: store half [[HALF_VAL]], half addrspace({{.}})* %bar, align 2
+}
+
+// CHECK-LABEL: @test_load_float(float addrspace({{.}}){{.*}} %foo, half addrspace({{.}}){{.*}} %bar)
+__kernel void test_load_float(__global float* foo, __global half* bar)
+{
+	foo[0] = __builtin_load_halff(bar);
+// CHECK: [[HALF_VAL:%.*]] = load
+// CHECK: [[FULL_VAL:%.*]] = fpext half [[HALF_VAL]] to float
+// CHECK: store float [[FULL_VAL]], float addrspace({{.}})* %foo
+}
+
+// CHECK-LABEL: @test_load_double(double addrspace({{.}}){{.*}} %foo, half addrspace({{.}}){{.*}} %bar)
+__kernel void test_load_double(__global double* foo, __global half* bar)
+{
+	foo[0] = __builtin_load_half(bar);
+// CHECK: [[HALF_VAL:%.*]] = load
+// CHECK: [[FULL_VAL:%.*]] = fpext half [[HALF_VAL]] to double
+// CHECK: store double [[FULL_VAL]], double addrspace({{.}})* %foo
+}
Index: lib/CodeGen/CGBuiltin.cpp
===
--- lib/CodeGen/CGBuiltin.cpp
+++ lib/CodeGen/CGBuiltin.cpp
@@ -2724,6 +2724,24 @@
 Name),
 {NDRange, Block}));
   }
+
+  case Builtin::BI__builtin_store_half:
+  case Builtin::BI__builtin_store_halff: {
+Value *Val = EmitScalarExpr(E->getArg(0));
+Address Address = EmitPointerWithAlignment(E->getArg(1));
+Value *HalfVal = Builder.CreateFPTrunc(Val, Builder.getHalfTy());
+return RValue::get(Builder.CreateStore(HalfVal, Address));
+  }
+  case Builtin::BI__builtin_load_half: {
+Address Address = EmitPointerWithAlignment(E->getArg(0));
+Value *HalfVal = Builder.CreateLoad(Address);
+return RValue::get(Builder.CreateFPExt(HalfVal, Builder.getDoubleTy()));
+  }
+  case Builtin::BI__builtin_load_halff: {
+Address Address = EmitPointerWithAlignment(E->getArg(0));
+Value *HalfVal = Builder.CreateLoad(Address);
+return RValue::get(Builder.CreateFPExt(HalfVal, Builder.getFloatTy()));
+  }
   case Builtin::BIprintf:
 if (getTarget().getTriple().isNVPTX())
   return EmitNVPTXDevicePrintfCallExpr(E, ReturnValue);
Index: lib/Basic/Builtins.cpp
===
--- lib/Basic/Builtins.cpp
+++ lib/Basic/Builtins.cpp
@@ -69,9 +69,14 @@
   bool MSModeUnsupported =
   !LangOpts.MicrosoftExt && (BuiltinInfo.Langs & MS_LANG);
   bool ObjCUnsupported = !LangOpts.ObjC1 && BuiltinInfo.Langs == OBJC_LANG;
-  bool OclCUnsupported = LangOpts.OpenCLVersion != 200 &&
- BuiltinInfo.Langs == OCLC20_LANG;
+  bool OclC1Unsupported = (LangOpts.OpenCLVersion / 100) != 1 &&
+  (BuiltinInfo.Langs & ALL_OCLC_LANGUAGES ) ==  OCLC1X_LANG;
+  bool OclC2Unsupported = LangOpts.OpenCLVersion != 200 &&
+  (BuiltinInfo.Langs & ALL_OCLC_LANGUAGES) == OCLC20_LANG;
+  bool OclCUnsupported = !LangOpts.OpenCL &&
+ (BuiltinInfo.Langs & ALL_OCLC_LANGUAGES);
   return !BuiltinsUnsupported && !MathBuiltinsUnsupported && !OclCUnsupported &&
+ !OclC1Unsupported && !OclC2Unsupported &&
  !GnuModeUnsupported && !MSModeUnsupported && !ObjCUnsupported;
 }
 
Index: include/clang/Basic/Builtins.h
===
--- include/clang/Basic/Builtins.h
+++ include/clang/Basic/Builtins.h
@@ -36,10 +36,12 @@
   CXX_LANG = 0x4, // builtin for cplusplus only.
   OBJC_LANG = 0x8,// builtin for objective-c and objective-c++
   MS_LANG = 0x10, // builtin requires MS mode.
-  OCLC20_LANG = 0x20, // builtin for OpenCL C only.
+  OCLC20_LANG = 0x20, // builtin for OpenCL C 2.0 only.
+  

[PATCH] D37231: Add half load and store builtins

2017-08-29 Thread Jan Vesely via Phabricator via cfe-commits
jvesely added inline comments.



Comment at: include/clang/Basic/Builtins.def:1427
+// OpenCL half load/store builtin
+BUILTIN(__builtin_store_half, "vdh*", "n")
+BUILTIN(__builtin_store_halff, "vfh*", "n")

Anastasia wrote:
> I think this should be a language builtin (see above) but perhaps we might 
> need to extend the language version here. Because I believe we only have 
> OpenCL v2.0 currently.
> 
> Also this should only be available if `cl_khr_fp16` is supported and enabled? 
> I think we are doing similar with some subgroups functions (e.g. 
> `get_kernel_sub_group_count_for_ndrange`) that are only supported by 
> `cl_khr_subgroup` but those have custom diagnostic though. May be we could 
> leave this check out since `half` is not available if `cl_khr_fp16` is not 
> enabled anyways.
This is specifically meant to be used when `cl_khr_fp16` is **not** available.
CLC allows using half as storage format and  half pointers without the 
extension,
vstore_half/vload_half are used to load/store half values. (CL1.2 CH 6.1.1.1)

These builtins are not necessary if `cl_khr_fp16` is available (we can use 
regular loads/stores).

I'll take stab at making these CLC only, but similarly to device specific 
builtins it looked useful beyond that, since these builtins provide access to 
half type storage.


Repository:
  rL LLVM

https://reviews.llvm.org/D37231



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits