I couldn't stop thinking about this, so keeping a close on benchmarks I found my solution:

clonePtr ptr l = do
    ret <- mallocForeignPtrArray l
    withForeignPtr ret $ \ptr' -> copyArray ptr ptr' l
    return ret
peekLazy fp 0 = []
peekLazy fp l = x:peekLazy (fp `plusForeignPtr` 1) (pred l)
  where x = unsafePerformIO $ withForeignPtr fp peek
iterateLazy ptr l = do
    fp <- clonePtr ptr l
    return $ peekLazy fp $ fromEnum l

The clone's in there because I free the C library's datastructures as the function call ends.
On Thu, Jan 1, 1970 at 12:00 AM, adr...@openwork.nz wrote:
I've got an interesting puzzle I'm pulling my hair out over.

I'm writing (pure functional) Haskell bindings for a C library which returns 2 large arrays of 32bit words (or rather arrays of structs, all of who's fields are very conveniently aligned). But these bindings I wrote are far too slow for my uses.

A primary performance problem identified by clear box benchmarking are a couple calls to `peekArray`. I've previously used `forM [0..length - 1] $ peekOffset arr`, but switching away from that yielded only minor performance benefit. Commenting out this `peekArray` removes all the overhead I'm seeing.

(Strangely commenting out any usage of the returned list does likewise, even though I've verified that the whole list doesn't get iterated over in WHNF)

The same benchmarks indicate that the postprocessing I'm applying over those lists has imperceptible overhead.

In short: What's the fastest `peekArray` alternative for `Ptr Word32` you recommend?

P.S. I'm inputting a nearly 1mb novel (Dracula by Bram Stoker) to get an estimated 20mb * 2 worth of output.
_______________________________________________
FFI mailing list
FFI@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ffi

Reply via email to