I posted a summary article on "what this lets you do" to https://www.fpcomplete.com/user/edwardk/unlifted-structures
I can see about making a more proposal/feature-oriented summary for the Haskell Wiki. It may have to wait until after ICFP though. -Edward On Fri, Aug 28, 2015 at 5:42 AM, Simon Peyton Jones <[email protected]> wrote: > At the very least I'll take this email and turn it into a short article. > > Yes, please do make it into a wiki page on the GHC Trac, and maybe make a > ticket for it. > > > Thanks > > > > Simon > > > > *From:* Edward Kmett [mailto:[email protected]] > *Sent:* 27 August 2015 16:54 > *To:* Simon Peyton Jones > *Cc:* Manuel M T Chakravarty; Simon Marlow; ghc-devs > *Subject:* Re: ArrayArrays > > > > An ArrayArray# is just an Array# with a modified invariant. It points > directly to other unlifted ArrayArray#'s or ByteArray#'s. > > > > While those live in #, they are garbage collected objects, so this all > lives on the heap. > > > > They were added to make some of the DPH stuff fast when it has to deal > with nested arrays. > > > > I'm currently abusing them as a placeholder for a better thing. > > > > The Problem > > ----------------- > > > > Consider the scenario where you write a classic doubly-linked list in > Haskell. > > > > data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) > > > > Chasing from one DLL to the next requires following 3 pointers on the heap. > > > > DLL ~> IORef (Maybe DLL) ~> MutVar# RealWorld (Maybe DLL) ~> Maybe DLL ~> > DLL > > > > That is 3 levels of indirection. > > > > We can trim one by simply unpacking the IORef with -funbox-strict-fields > or UNPACK > > > > We can trim another by adding a 'Nil' constructor for DLL and worsening > our representation. > > > > data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil > > > > but now we're still stuck with a level of indirection > > > > DLL ~> MutVar# RealWorld DLL ~> DLL > > > > This means that every operation we perform on this structure will be about > half of the speed of an implementation in most other languages assuming > we're memory bound on loading things into cache! > > > > Making Progress > > ---------------------- > > > > I have been working on a number of data structures where the indirection > of going from something in * out to an object in # which contains the real > pointer to my target and coming back effectively doubles my runtime. > > > > We go out to the MutVar# because we are allowed to put the MutVar# onto > the mutable list when we dirty it. There is a well defined write-barrier. > > > > I could change out the representation to use > > > > data DLL = DLL (MutableArray# RealWorld DLL) | Nil > > > > I can just store two pointers in the MutableArray# every time, but this > doesn't help _much_ directly. It has reduced the amount of distinct > addresses in memory I touch on a walk of the DLL from 3 per object to 2. > > > > I still have to go out to the heap from my DLL and get to the array object > and then chase it to the next DLL and chase that to the next array. I do > get my two pointers together in memory though. I'm paying for a card > marking table as well, which I don't particularly need with just two > pointers, but we can shed that with the "SmallMutableArray#" machinery > added back in 7.10, which is just the old array code a a new data type, > which can speed things up a bit when you don't have very big arrays: > > > > data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil > > > > But what if I wanted my object itself to live in # and have two mutable > fields and be able to share the sme write barrier? > > > > An ArrayArray# points directly to other unlifted array types. What if we > have one # -> * wrapper on the outside to deal with the impedence mismatch > between the imperative world and Haskell, and then just let the > ArrayArray#'s hold other arrayarrays. > > > > data DLL = DLL (MutableArrayArray# RealWorld) > > > > now I need to make up a new Nil, which I can just make be a special > MutableArrayArray# I allocate on program startup. I can even abuse pattern > synonyms. Alternately I can exploit the internals further to make this > cheaper. > > > > Then I can use the readMutableArrayArray# and writeMutableArrayArray# > calls to directly access the preceding and next entry in the linked list. > > > > So now we have one DLL wrapper which just 'bootstraps me' into a strict > world, and everything there lives in #. > > > > next :: DLL -> IO DLL > > next (DLL m) = IO $ \s -> case readMutableArrayArray# s of > > (# s', n #) -> (# s', DLL n #) > > > > It turns out GHC is quite happy to optimize all of that code to keep > things unboxed. The 'DLL' wrappers get removed pretty easily when they are > known strict and you chain operations of this sort! > > > > Cleaning it Up > > ------------------ > > > > Now I have one outermost indirection pointing to an array that points > directly to other arrays. > > > > I'm stuck paying for a card marking table per object, but I can fix that > by duplicating the code for MutableArrayArray# and using a > SmallMutableArray#. I can hack up primops that let me store a mixture of > SmallMutableArray# fields and normal ones in the data structure. > Operationally, I can even do so by just unsafeCoercing the existing > SmallMutableArray# primitives to change the kind of one of the arguments it > takes. > > > > This is almost ideal, but not quite. I often have fields that would be > best left unboxed. > > > > data DLLInt = DLL !Int !(IORef DLL) !(IORef DLL) | Nil > > > > was able to unpack the Int, but we lost that. We can currently at best > point one of the entries of the SmallMutableArray# at a boxed or at a > MutableByteArray# for all of our misc. data and shove the int in question > in there. > > > > e.g. if I were to implement a hash-array-mapped-trie I need to store masks > and administrivia as I walk down the tree. Having to go off to the side > costs me the entire win from avoiding the first pointer chase. > > > > But, if like Ryan suggested, we had a heap object we could construct that > had n words with unsafe access and m pointers to other heap objects, one > that could put itself on the mutable list when any of those pointers > changed then I could shed this last factor of two in all circumstances. > > > > Prototype > > ------------- > > > > Over the last few days I've put together a small prototype implementation > with a few non-trivial imperative data structures for things like Tarjan's > link-cut trees, the list labeling problem and order-maintenance. > > > > https://github.com/ekmett/structs > > > > Notable bits: > > > > Data.Struct.Internal.LinkCut > <https://github.com/ekmett/structs/blob/9ff2818f888aff4789b7a41077a674a10d15e6ee/src/Data/Struct/Internal/LinkCut.hs> > provides an implementation of link-cut trees in this style. > > > > Data.Struct.Internal > <https://github.com/ekmett/structs/blob/9ff2818f888aff4789b7a41077a674a10d15e6ee/src/Data/Struct/Internal.hs> > provides the rather horrifying guts that make it go fast. > > > > Once compiled with -O or -O2, if you look at the core, almost all the > references to the LinkCut or Object data constructor get optimized away, > and we're left with beautiful strict code directly mutating out underlying > representation. > > > > At the very least I'll take this email and turn it into a short article. > > > > -Edward > > > > On Thu, Aug 27, 2015 at 9:00 AM, Simon Peyton Jones <[email protected]> > wrote: > > Just to say that I have no idea what is going on in this thread. What is > ArrayArray? What is the issue in general? Is there a ticket? Is there a > wiki page? > > > > If it’s important, an ab-initio wiki page + ticket would be a good thing. > > > > Simon > > > > *From:* ghc-devs [mailto:[email protected]] *On Behalf Of *Edward > Kmett > *Sent:* 21 August 2015 05:25 > *To:* Manuel M T Chakravarty > *Cc:* Simon Marlow; ghc-devs > *Subject:* Re: ArrayArrays > > > > When (ab)using them for this purpose, SmallArrayArray's would be very > handy as well. > > > > Consider right now if I have something like an order-maintenance structure > I have: > > > > data Upper s = Upper {-# UNPACK #-} !(MutableByteArray s) {-# UNPACK #-} > !(MutVar s (Upper s)) {-# UNPACK #-} !(MutVar s (Upper s)) > > > > data Lower s = Lower {-# UNPACK #-} !(MutVar s (Upper s)) {-# UNPACK #-} > !(MutableByteArray s) {-# UNPACK #-} !(MutVar s (Lower s)) {-# UNPACK #-} > !(MutVar s (Lower s)) > > > > The former contains, logically, a mutable integer and two pointers, one > for forward and one for backwards. The latter is basically the same thing > with a mutable reference up pointing at the structure above. > > > > On the heap this is an object that points to a structure for the > bytearray, and points to another structure for each mutvar which each point > to the other 'Upper' structure. So there is a level of indirection smeared > over everything. > > > > So this is a pair of doubly linked lists with an upward link from the > structure below to the structure above. > > > > Converted into ArrayArray#s I'd get > > > > data Upper s = Upper (MutableArrayArray# s) > > > > w/ the first slot being a pointer to a MutableByteArray#, and the next 2 > slots pointing to the previous and next previous objects, represented just > as their MutableArrayArray#s. I can use sameMutableArrayArray# on these for > object identity, which lets me check for the ends of the lists by tying > things back on themselves. > > > > and below that > > > > data Lower s = Lower (MutableArrayArray# s) > > > > is similar, with an extra MutableArrayArray slot pointing up to an upper > structure. > > > > I can then write a handful of combinators for getting out the slots in > question, while it has gained a level of indirection between the wrapper to > put it in * and the MutableArrayArray# s in #, that one can be basically > erased by ghc. > > > > Unlike before I don't have several separate objects on the heap for each > thing. I only have 2 now. The MutableArrayArray# for the object itself, and > the MutableByteArray# that it references to carry around the mutable int. > > > > The only pain points are > > > > 1.) the aforementioned limitation that currently prevents me from stuffing > normal boxed data through a SmallArray or Array into an ArrayArray leaving > me in a little ghetto disconnected from the rest of Haskell, > > > > and > > > > 2.) the lack of SmallArrayArray's, which could let us avoid the card > marking overhead. These objects are all small, 3-4 pointers wide. Card > marking doesn't help. > > > > Alternately I could just try to do really evil things and convert the > whole mess to SmallArrays and then figure out how to unsafeCoerce my way to > glory, stuffing the #'d references to the other arrays directly into the > SmallArray as slots, removing the limitation we see here by aping the > MutableArrayArray# s API, but that gets really really dangerous! > > > > I'm pretty much willing to sacrifice almost anything on the altar of speed > here, but I'd like to be able to let the GC move them and collect them > which rules out simpler Ptr and Addr based solutions. > > > > -Edward > > > > On Thu, Aug 20, 2015 at 9:01 PM, Manuel M T Chakravarty < > [email protected]> wrote: > > That’s an interesting idea. > > Manuel > > > Edward Kmett <[email protected]>: > > > > > Would it be possible to add unsafe primops to add Array# and SmallArray# > entries to an ArrayArray#? The fact that the ArrayArray# entries are all > directly unlifted avoiding a level of indirection for the containing > structure is amazing, but I can only currently use it if my leaf level data > can be 100% unboxed and distributed among ByteArray#s. It'd be nice to be > able to have the ability to put SmallArray# a stuff down at the leaves to > hold lifted contents. > > > > I accept fully that if I name the wrong type when I go to access one of > the fields it'll lie to me, but I suppose it'd do that if i tried to use > one of the members that held a nested ArrayArray# as a ByteArray# anyways, > so it isn't like there is a safety story preventing this. > > > > I've been hunting for ways to try to kill the indirection problems I get > with Haskell and mutable structures, and I could shoehorn a number of them > into ArrayArrays if this worked. > > > > Right now I'm stuck paying for 2 or 3 levels of unnecessary indirection > compared to c/java and this could reduce that pain to just 1 level of > unnecessary indirection. > > > > -Edward > > > _______________________________________________ > > ghc-devs mailing list > > [email protected] > > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > > > > >
_______________________________________________ ghc-devs mailing list [email protected] http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
