[ 
https://issues.apache.org/jira/browse/PROTON-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18029815#comment-18029815
 ] 

Pete Fawcett commented on PROTON-2903:
--------------------------------------

In trying to produce a simpler client program, to illustrate the problem, I 
found an error with my code.  I was calling one of the proton functions from a 
different thread.  Changing this has stopped the error occurring.

I still have questions about the reference counting but, in the absence of a 
demonstrable bug, it is probably unreasonable to press the point.

 

> Python Fatal Error in void2py ffi.from_handle
> ---------------------------------------------
>
>                 Key: PROTON-2903
>                 URL: https://issues.apache.org/jira/browse/PROTON-2903
>             Project: Qpid Proton
>          Issue Type: Bug
>          Components: python-binding
>    Affects Versions: proton-c-0.39.0, proton-c-future
>         Environment: qpid-proton 0.41.0-dev (current main branch)
> Rocky Linux 9.5
> Python 3.12.5 (CPython)
> cffi (pip package) 1.17.1
>            Reporter: Pete Fawcett
>            Priority: Major
>
> Experiencing sporadic, but not infrequent, "Python Fatal Error" in proton 
> client. IT seems to happen more often when the client is under more load,
> I am investigating using the current main branch, but the relevant code seems 
> to be unchanged since v 0.39.0
> The error is happening in the call to {{ffi.from_handle}} within 
> {{cproton.void2py}}
> {{from_handle()}} is calling {{Py_FatalError()}} because the handle passed 
> in, whilst being a {{<cdata void *>}} object, is not pointing to an object of 
> the expected type ({{{}CDataOwningGC_Type{}}})
> I made a custom version of {{cffi}} to include, in the error message, what 
> type was being pointed to.  The results varied but they all seemed to be 
> valid types, i.e. not garbage.
> This suggests to me that the original object has been freed and the memory 
> used by a new object.
> I am only beginning to learn about the use of CFFI handles. but from what I 
> understand, it is the responsibility of the Python code, when creating a 
> handle to a Python object using {{{}ffi.new_handle(){}}}, to keep the handle 
> and the original 'alive' by maintaining a reference to the handle.
> This appears to be the intended function of {{cproton.retained_objects}} but 
> I think the current implementation is too simple.
> Some things that I think are contributing to the problem are:
>  * {{retained_objects}} is a set so effectively only provides a reference 
> count of 1
>  * {{retained_objects}} is added to on the 'Python side' following calls to 
> {{new_handle}} but also, indirectly, from the 'C side' via {{pn_pyref_incref}}
>  * Different {{<cdata>}} instances can hash to the same value and so are 
> treated as being the same within {{{}retained_objects{}}}.  (I think the hash 
> is based on the content, so where a {{<cdata void *>}} is pointing)
> One potentially problematic call sequence I have seen is in 
> {{cproton.pn_collector_put_pyref}}
>  # A new handle ({{{}d{}}}) is created an added to {{retained_objects}}
>  # {{d}} is passed to {{lib.pn_collector_put_py}}
>  # within that call a callback is made to {{pn_pyref_incref}} with parameter 
> {{obj}}
>  # {{obj}} is a different instance to {{d}} (the {{id()}} differs) but is has 
> the same hash and so the call to {{retained_objects.add(obj)}} doesn't add a 
> new item.
> So, if at some point {{pn_pyref_decref(obj)}} is called, it will release the 
> original {{d}} object.
> Whilst I think I can see why the problem might be happening I am not sure how 
> to fix it.  In particular, I haven't yet worked out what would be the correct 
> point to release the reference in {{{}retained_objects{}}}. Not releasing 
> them at all would stop the fatal error but would effectively be a "memory 
> leak" with resultant problems.
> It might be that better reference counting is required, rather than just the 
> single level afforded by {{retained_objects}} being a {{set}} - but then 
> there might not be "enough" {{decref}} calls to free up the object at all.
> Perhaps releasing the handle within {{void2py}} would work, but that feels 
> too risky without better understanding of the intended lifetime.
> I would appreciate some guidance on the original design thoughts behind the 
> current implementation, particularly around knowing when the handle is no 
> longer required.
> My current client program is non-trivial. It both receives and sends messages 
> and also makes use of {{{}ApplicationEvent{}}}. I am looking at how to make a 
> simpler client that can still produce the error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to