I am writing a Julia wrapper for a molecular simulation library, ASE. As 
with previous PyCall experiences, this works really well UNTIL I started to 
benchmark a certain critical portion of the code. For context: I iterate 
over all atoms and for each atom get the interaction neighbourhood from 
ASE's neighbourlist.  The call get_neighbours(n) returns bumpy arrays of 
type Int64.

# [1] Standard call: 0.356330 seconds (89.00 k allocations: 3.258 MB)
@time for n = 1:500 I, off = nlist.po[:get_neighbors](0); end   

# [2] get a tuple of PyArrays to avoid automatic conversion of integer 
arrays: 0.379282 seconds (62.00 k allocations: 2.022 MB)
@time for n = 1:500 ret = pycall(nlist.po[:get_neighbors], Tuple{PyArray, 
PyArray}, 0); end

# [3] just get a PyObject: 0.039529 seconds (9.50 k allocations: 273.438 KB)
@time for n = 1:500 begin 
        ret = pycall(nlist.po[:get_neighbors], PyObject, 0); end
        # ret1 = pycall(ret[:__getitem__], PyObject, 0)
        # ret2 = pycall(ret[:__getitem__], PyObject, 1)
end

# [4]  with lines 2-3 in [3] uncommented: 0.101147 seconds (28.50 k 
allocations: 820.313 KB)

The plain call without conversion is so much faster that I decided to start 
digging around where to get back some time. I am fairly confident that it 
is not related to issue [https://github.com/stevengj/PyCall.jl/issues/90] 
(but who knows?). [4] suggests that a large proportion of the 0.039 seconds 
is still overhead even though no conversion was performed? This poor 
performance makes this unfortunately useless and means I will have to write 
my own neighbourlist in pure Julia (which is perfectly doable in 300-500 
lines; the bigger issue is maintenance and compatibility as ASE evolves). 
So my 

QUESTION: is there a way (ideally staying in pure Julia) to directly access 
the data in the PyObject, with minimal overhead, provided I know exactly 
beforehand what the objects are?

Many thanks,
     Christoph

Reply via email to