Re: [Cython] some diffs to boost::python classes

Stefan Behnel Sun, 29 Aug 2010 00:36:49 -0700

Lisandro Dalcin, 29.08.2010 03:39:
> On 28 August 2010 03:12, Stefan Behnel wrote:
>> Lisandro Dalcin, 27.08.2010 23:38:
>>>> I think the right way to implement this is with custom decriptors for both
>>>> the type and its instance.
>>>
>>> Do you mean adding a descriptor for __dict__ ? If you use cdef
>>> public|readonly __dict__, you get it.
>>
>> Right, I think __dict__ should be "public" by default, just like
>> __weakref__ is always usable from outside.
>
> Do you think that "cdef private __dict__" would have any use?


A usecase might be a type that allows arbitrary attribute changes from C 
code but not from Python code. Not completely unrealistic IMHO.

It might be possible to actually provide such semantics by letting the 
descriptor raise an exception on access and passing through tp_dict* from 
Cython code. However, I'm not sure CPython goes through the descriptor in 
all cases when accessing tp_dict*, so it's left to be seen how well this 
would actually work. But I think the decision can be taken at a later time 
regardless of what we do now, so making __dict__ public by default seems 
the most reasonable thing to do.


> We cannot write "private"...

I know. That's another issue that we should solve one day. There should be 
a way to say: "this is not part of the public API of this module", also for 
extension types.

http://trac.cython.org/cython_trac/ticket/263


>>> But Cython code will make a
>>> direct pointer access, like for any other cdef member... What's wrong
>>> with this approach?
>>
>> Well, it wastes instance memory if __dict__ is not used. I would expect
>> that most use cases for __dict__ do not come from Cython code but from
>> Python code that uses the types. As I said, going through a descriptor
>> would allow instantiating the dict at first request rather than *always*.
>> That's important for library code.
>>
>> Potentially, we could also add a NULL check in the Cython access code
>> wherever __dict__ is used, and go through the descriptor only if it's not
>> set up yet. But that would be an optimisation. I think safe and memory
>> friendly access should go first.
>
> On one hand, I think that who adds a __dict__ to cdef classes do know
> the trade-offs, as __dict__ is not provided unless explicitly
> requested, I do not consider a big deal to unconditionally create the
> __dict__ and have fast access, we can optimize things later. My
> current patch do work already, users needing this could start using
> the feature in 0.13.1

You're right, memory efficiency is also just a kind of optimisation, and 
the new feature doesn't impact existing code.

However, it's already known that the "fast access" may have to be disabled 
later on when applying that optimisation or may at least receive a slight 
performance hit due to the NULL check and the potentially multiplicated 
lazy initialisation code. So code that gets written now that relies on the 
now-to-be-set behaviour may behave slightly different later on.

I think I'm ok with applying your patch to get the feature working now, 
although I still think that the memory hit and the impact on type 
instantiation time is a much more important factor than the access 
performance. If users wanted fast access from their Cython code, they'd use 
static cdef attributes in the first place.


> On the other hand, I do not care too much if this is implemented going
> through a descriptor, but as I'm not going to use this feature anyway.

I think that describes the main use case already: "I won't use it in my 
code, but the users of my code *may* want to have it". That's the main 
reason why I'm advocating a low impact for the unused case.

Note that your patch only solves half of the problem that Neal describes: 
it implements __dict__ for instances but not for types. I wouldn't care so 
much about the memory impact for a type dict, but there are usually a lot 
more instances than types, and if only a couple of instances actively use 
the __dict__, then the memory overhead of the instances that do not use it 
will be important. For example, lxml will definitely not use this for its 
main API type in the current state, although it would be a really nice 
addition. Even adding an empty dict pointer (4/8 bytes) to each of the 
instances is worth some careful consideration when the number of instances 
can become huge. But the real killer argument here is the performance hit 
on instantiation which happens tons of times all over the place in lxml.

BTW, the patch needs to be revised when we implement support for subtyping 
PyVarObject (bytes/tuple).

http://trac.cython.org/cython_trac/ticket/152


> It is  hard follow the criteria of core Cython developers and
> contributors :-)... For some things, they like it fast despite being
> unsafe, for others this it is the other way around, some idiot guys as
> for compiler directives controlling every detail...

I can see that this may appear arbitrary at times. I hope I made it a bit 
clearer above why I think that this particular patch provides an 
implementation trade-off that doesn't match well with the most common use 
case. It just doesn't make sense to think about optimisations without a 
couple of important use cases in mind, especially when there *are* drawbacks.

Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] some diffs to boost::python classes

Reply via email to