[resent after subscribing to the ML to avoid the message to be rejected]

Le 22/08/2012 10:24, Vincent Danjean a écrit :
> Le 21/08/2012 22:49, Tomasz Rybak a écrit :
>> Added Vincent Danjean (co-author of free OpenCL ICD loader) to CC.
> 
> Adding Brice, the other OpenCL ICD Loader co-author
> 
>> Dnia 2012-08-21, wto o godzinie 11:25 -0400, Andreas Kloeckner pisze:
>>> Andreas Kloeckner <[email protected]> writes:
>>>> Tomasz Rybak <[email protected]> writes:
>>>>> As I wrote in my email from 2012-08-14, I experienced
>>>>> crashes in image-related functions in test_wrapper.py
>>>>> on NVIDIA hardware. I managed to find the reason of that
>>>>> crash and fix it (patch in attachment). Below you can find
>>>>> explanation.
>>>>>
>>>>> Unlike all other vendors, NVIDIA still have not released
>>>>> OpenCL 1.2. Image creation functions have changed in
>>>>> OpenCL 1.2 - now clCreateImage expects to get cl_image_desc
>>>>> instead of bunch of arguments like height, width, etc.
>>>>> PyOpenCL tests in Image constructor (pyopencl/__init__.py,
>>>>> line 200-ish) whether it is run on OpenCL 1.2 or 1.1,
>>>>> and runs appropriate code based on this. It uses
>>>>> get_cl_header_version() for this check which fails in
>>>>> some situations, e.g. on Debian. In Debian we have opencl-headers
>>>>> (currently in 1.2), ICD loader (1.2) and ICD implementations
>>>>> with different versions. This means that headers will always
>>>>> have version 1.2 (or higher - but it'll be the highest
>>>>> possible version) but platforms might have lower versions.
>>>>> This was the case of this segfault. PyOpenCL expected to have
>>>>> new clImageCreate, ICD loader was ready to give pointer
>>>>> to this function to PyOpenCL, but platform (NVIDIA)
>>>>> was not providing it.
>>>>>
>>>>> I have changed Image constructor to base usage of clCreateImage on
>>>>> devices' platform version. I assumed that Context always have
>>>>> at least one device - if not, please change this code.
>>>>
>>>> I'm wondering--isn't this an issue with the ICD loader? I had sort of
>>>> expected that the header needs to match the loader, and if the loader
>>>> exports a 1.2 interface, then all of those functions are at least safe
>>>> to call--i.e. I as a user don't have to go around checking versions just
>>>> to determine what API I can call.
>>>>
>>>> In this particular instance, I had thought the ICD loader would
>>>> translate the call to the old 1.1 interface, or, if impossible, provide
>>>> an error.
>>>>
>>>> A segfault is most definitely not an appropriate response...
>>>>
>>>> OTOH, it seems both the AMD ICD loader and the open-source ICD loader
>>>> (as you indicate) behave this way, so we might not get a choice in this
>>>> matter.
>>>>
>>>> Does the spec say anything on this? What's your assessment? Are these
>>>> transient bugs in the ICD loader, design flaws in the spec, or something
>>>> completely different?
> 
> I think this is a design flaws in the specs.
> An ICD Loader has no information at all (unless we decide to hardcode
> some of them) about the supported OpenCL functions by the loaded ICD.
> 
> An ICD Loader merely get a address of of array of function pointers. It
> even does not know the size of the array it gets. This means, there is
> no reliable way to know if the address we got for a 1.2 function is
> garbage (belong the end of the table of an 1.1 implementation) or
> correct.
> Looking at version advertised by the ICD implementation is not a
> solution: Intel implementation advertises 1.1 but implement (part of)
> 1.2
> 
> I'm willing to add/patch anything required in ocl-icd. We can add some
> more functions to the interface (it means that a program using these
> functions will not work with other OpenCL ICD loader) or provides
> them in an additional library (so that it works with any ICD Loader
> implementation).
> But, for now, I see no other way than using hardcoded information.
> If we go this path, we should think about which information we want
> exactly and how we want them to be presented (ie API/ABI)
> 
> What I can propose is that, for any public symbol, we try to look
> if the corresponding function exists in the targeted ICD. Some
> sanity checks can be done automatically (non-null pointer, ...)
> but some hard-coded information will be required.
> 
> I also ask me how/if we can divert the internal function pointer
> structure provided by the implementation in order to fully fill
> a whole structure (with error functions for the missing ones).
> I think that is is feasible. But that it would be possible for
> a strange ICD implementation to respect the standard and break
> with what I imagine (ie my implementation would be border-line
> with respect to the ABI specifications)
> 
> 
> About the specific problem of clCreateImage, I tried to look at
> it. If I understand correctly,
> OpenCL 1.1 defines clCreateImage2D and clCreateImage3D
> OpenCL 1.2 defines clCreateImage and deprecates clCreateImage2D
>   and clCreateImage3D
> I.e, contrary to the initial message, I do not think that a
> prototype changes (this would be a severe bug with respect to
> ICD Loader specifications). But we have 2/3 functions with
> similar prototypes, some provided by some implementations,
> other provided by other implementations.
>   But, for now, I do not think there is any way an implementation
> can currently reliably detect at run-time if an specific plate-form
> implement or not theses functions.
> 
>   Regards,
>     Vincent
> 
>>>> I must admit I'm pretty reluctant to call a bunch of GetInfo functions
>>>> and then do a bunch of string processing just to figure out what
>>>> function is safe to call. Maybe as a temporary workaround, but not as a
>>>> permanent thing.
>>>>
>>>> Any opinions/insights?
>>>
>>> Hi Tomasz,
>>>
>>> any news on this front?
>>
>> Sorry - did not have time to investigate it deeply.
>> Description of OpenCL ICD extension does not deal with case
>> of different platform versions.
>>
>> Vincent - any thoughts about how OpenCL should behave
>> in current case, when loader has version 1.2 and
>> tries to use ICD with version 1.1 (here NVIDIA one)?
>>
>> Best regards.
>>
> 
> 

-- 
Vincent Danjean          Adresse: Laboratoire d'Informatique de Grenoble
Téléphone:  +33 4 76 61 20 11            ENSIMAG - antenne de Montbonnot
Fax:        +33 4 76 61 20 99            ZIRST 51, avenue Jean Kuntzmann
Email: [email protected]           38330 Montbonnot Saint Martin

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to