Pete Fawcett created PROTON-2903:
------------------------------------

             Summary: Python Fatal Error in void2py ffi.from_handle
                 Key: PROTON-2903
                 URL: https://issues.apache.org/jira/browse/PROTON-2903
             Project: Qpid Proton
          Issue Type: Bug
          Components: python-binding
    Affects Versions: proton-c-0.39.0, proton-c-future
         Environment: qpid-proton 0.41.0-dev (current main branch)
Rocky Linux 9.5
Python 3.12.5 (CPython)
cffi (pip package) 1.17.1
            Reporter: Pete Fawcett


Experiencing sporadic, but not infrequent, "Python Fatal Error" in proton 
client. IT seems to happen more often when the client is under more load,

I am investigating using the current main branch, but the relevant code seems 
to be unchanged since v 0.39.0

The error is happening in the call to {{ffi.from_handle}} within 
{{cproton.void2py}}

{{from_handle()}} is calling {{Py_FatalError()}} because the handle passed in, 
whilst being a {{<cdata void *>}} object, is not pointing to an object of the 
expected type ({{{}CDataOwningGC_Type{}}})

I made a custom version of {{cffi}} to include, in the error message, what type 
was being pointed to.  The results varied but they all seemed to be valid 
types, i.e. not garbage.

This suggests to me that the original object has been freed and the memory used 
by a new object.

I am only beginning to learn about the use of CFFI handles. but from what I 
understand, it is the responsibility of the Python code, when creating a handle 
to a Python object using {{{}ffi.new_handle(){}}}, to keep the handle and the 
original 'alive' by maintaining a reference to the handle.

This appears to be the intended function of {{cproton.retained_objects}} but I 
think the current implementation is too simple.

Some things that I think are contributing to the problem are:
 * {{retained_objects}} is a set so effectively only provides a reference count 
of 1
 * {{retained_objects}} is added to on the 'Python side' following calls to 
{{new_handle}} but also, indirectly, from the 'C side' via {{pn_pyref_incref}}
 * Different {{<cdata>}} instances can hash to the same value and so are 
treated as being the same within {{{}retained_objects{}}}.  (I think the hash 
is based on the content, so where a {{<cdata void *>}} is pointing)

One potentially problematic call sequence I have seen is in 
{{cproton.pn_collector_put_pyref}}
 # A new handle ({{{}d{}}}) is created an added to {{retained_objects}}
 # {{d}} is passed to {{lib.pn_collector_put_py}}
 # within that call a callback is made to {{pn_pyref_incref}} with parameter 
{{obj}}
 # {{obj}} is a different instance to {{d}} (the {{id()}} differs) but is has 
the same hash and so the call to {{retained_objects.add(obj)}} doesn't add a 
new item.

So, if at some point {{pn_pyref_decref(obj)}} is called, it will release the 
original {{d}} object.

Whilst I think I can see why the problem might be happening I am not sure how 
to fix it.  In particular, I haven't yet worked out what would be the correct 
point to release the reference in {{{}retained_objects{}}}. Not releasing them 
at all would stop the fatal error but would effectively be a "memory leak" with 
resultant problems.

It might be that better reference counting is required, rather than just the 
single level afforded by {{retained_objects}} being a {{set}} - but then there 
might not be "enough" {{decref}} calls to free up the object at all.

Perhaps releasing the handle within {{void2py}} would work, but that feels too 
risky without better understanding of the intended lifetime.

I would appreciate some guidance on the original design thoughts behind the 
current implementation, particularly around knowing when the handle is no 
longer required.

My current client program is non-trivial. It both receives and sends messages 
and also makes use of {{{}ApplicationEvent{}}}. I am looking at how to make a 
simpler client that can still produce the error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to