[
https://issues.apache.org/jira/browse/PROTON-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pete Fawcett closed PROTON-2903.
--------------------------------
Resolution: Resolved
> Python Fatal Error in void2py ffi.from_handle
> ---------------------------------------------
>
> Key: PROTON-2903
> URL: https://issues.apache.org/jira/browse/PROTON-2903
> Project: Qpid Proton
> Issue Type: Bug
> Components: python-binding
> Affects Versions: proton-c-0.39.0, proton-c-future
> Environment: qpid-proton 0.41.0-dev (current main branch)
> Rocky Linux 9.5
> Python 3.12.5 (CPython)
> cffi (pip package) 1.17.1
> Reporter: Pete Fawcett
> Priority: Major
>
> Experiencing sporadic, but not infrequent, "Python Fatal Error" in proton
> client. IT seems to happen more often when the client is under more load,
> I am investigating using the current main branch, but the relevant code seems
> to be unchanged since v 0.39.0
> The error is happening in the call to {{ffi.from_handle}} within
> {{cproton.void2py}}
> {{from_handle()}} is calling {{Py_FatalError()}} because the handle passed
> in, whilst being a {{<cdata void *>}} object, is not pointing to an object of
> the expected type ({{{}CDataOwningGC_Type{}}})
> I made a custom version of {{cffi}} to include, in the error message, what
> type was being pointed to. The results varied but they all seemed to be
> valid types, i.e. not garbage.
> This suggests to me that the original object has been freed and the memory
> used by a new object.
> I am only beginning to learn about the use of CFFI handles. but from what I
> understand, it is the responsibility of the Python code, when creating a
> handle to a Python object using {{{}ffi.new_handle(){}}}, to keep the handle
> and the original 'alive' by maintaining a reference to the handle.
> This appears to be the intended function of {{cproton.retained_objects}} but
> I think the current implementation is too simple.
> Some things that I think are contributing to the problem are:
> * {{retained_objects}} is a set so effectively only provides a reference
> count of 1
> * {{retained_objects}} is added to on the 'Python side' following calls to
> {{new_handle}} but also, indirectly, from the 'C side' via {{pn_pyref_incref}}
> * Different {{<cdata>}} instances can hash to the same value and so are
> treated as being the same within {{{}retained_objects{}}}. (I think the hash
> is based on the content, so where a {{<cdata void *>}} is pointing)
> One potentially problematic call sequence I have seen is in
> {{cproton.pn_collector_put_pyref}}
> # A new handle ({{{}d{}}}) is created an added to {{retained_objects}}
> # {{d}} is passed to {{lib.pn_collector_put_py}}
> # within that call a callback is made to {{pn_pyref_incref}} with parameter
> {{obj}}
> # {{obj}} is a different instance to {{d}} (the {{id()}} differs) but is has
> the same hash and so the call to {{retained_objects.add(obj)}} doesn't add a
> new item.
> So, if at some point {{pn_pyref_decref(obj)}} is called, it will release the
> original {{d}} object.
> Whilst I think I can see why the problem might be happening I am not sure how
> to fix it. In particular, I haven't yet worked out what would be the correct
> point to release the reference in {{{}retained_objects{}}}. Not releasing
> them at all would stop the fatal error but would effectively be a "memory
> leak" with resultant problems.
> It might be that better reference counting is required, rather than just the
> single level afforded by {{retained_objects}} being a {{set}} - but then
> there might not be "enough" {{decref}} calls to free up the object at all.
> Perhaps releasing the handle within {{void2py}} would work, but that feels
> too risky without better understanding of the intended lifetime.
> I would appreciate some guidance on the original design thoughts behind the
> current implementation, particularly around knowing when the handle is no
> longer required.
> My current client program is non-trivial. It both receives and sends messages
> and also makes use of {{{}ApplicationEvent{}}}. I am looking at how to make a
> simpler client that can still produce the error.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]