I completely agree. I would love to spend some time during ICFP and friends talking about what it could look like. My small array for STM changes for the RTS can be seen here [1]. It is on a branch somewhere between 7.8 and 7.10 and includes irrelevant STM bits and some confusing naming choices (sorry), but should cover all the details needed to implement it for a non-STM context. The biggest surprise for me was following small array too closely and having a word/byte offset miss-match [2].
[1]: https://github.com/fryguybob/ghc/compare/ghc-htm-bloom...fryguybob:ghc-htm-mut [2]: https://ghc.haskell.org/trac/ghc/ticket/10413 Ryan On Fri, Aug 28, 2015 at 10:09 PM, Edward Kmett <ekm...@gmail.com> wrote: > I'd love to have that last 10%, but its a lot of work to get there and more > importantly I don't know quite what it should look like. > > On the other hand, I do have a pretty good idea of how the primitives above > could be banged out and tested in a long evening, well in time for 7.12. And > as noted earlier, those remain useful even if a nicer typed version with an > extra level of indirection to the sizes is built up after. > > The rest sounds like a good graduate student project for someone who has > graduate students lying around. Maybe somebody at Indiana University who has > an interest in type theory and parallelism can find us one. =) > > -Edward > > On Fri, Aug 28, 2015 at 8:48 PM, Ryan Yates <fryguy...@gmail.com> wrote: >> >> I think from my perspective, the motivation for getting the type >> checker involved is primarily bringing this to the level where users >> could be expected to build these structures. it is reasonable to >> think that there are people who want to use STM (a context with >> mutation already) to implement a straight forward data structure that >> avoids extra indirection penalty. There should be some places where >> knowing that things are field accesses rather then array indexing >> could be helpful, but I think GHC is good right now about handling >> constant offsets. In my code I don't do any bounds checking as I know >> I will only be accessing my arrays with constant indexes. I make >> wrappers for each field access and leave all the unsafe stuff in >> there. When things go wrong though, the compiler is no help. Maybe >> template Haskell that generates the appropriate wrappers is the right >> direction to go. >> There is another benefit for me when working with these as arrays in >> that it is quite simple and direct (given the hoops already jumped >> through) to play with alignment. I can ensure two pointers are never >> on the same cache-line by just spacing things out in the array. >> >> On Fri, Aug 28, 2015 at 7:33 PM, Edward Kmett <ekm...@gmail.com> wrote: >> > They just segfault at this level. ;) >> > >> > Sent from my iPhone >> > >> > On Aug 28, 2015, at 7:25 PM, Ryan Newton <rrnew...@gmail.com> wrote: >> > >> > You presumably also save a bounds check on reads by hard-coding the >> > sizes? >> > >> > On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett <ekm...@gmail.com> wrote: >> >> >> >> Also there are 4 different "things" here, basically depending on two >> >> independent questions: >> >> >> >> a.) if you want to shove the sizes into the info table, and >> >> b.) if you want cardmarking. >> >> >> >> Versions with/without cardmarking for different sizes can be done >> >> pretty >> >> easily, but as noted, the infotable variants are pretty invasive. >> >> >> >> -Edward >> >> >> >> On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett <ekm...@gmail.com> wrote: >> >>> >> >>> Well, on the plus side you'd save 16 bytes per object, which adds up >> >>> if >> >>> they were small enough and there are enough of them. You get a bit >> >>> better >> >>> locality of reference in terms of what fits in the first cache line of >> >>> them. >> >>> >> >>> -Edward >> >>> >> >>> On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton <rrnew...@gmail.com> >> >>> wrote: >> >>>> >> >>>> Yes. And for the short term I can imagine places we will settle with >> >>>> arrays even if it means tracking lengths unnecessarily and >> >>>> unsafeCoercing >> >>>> pointers whose types don't actually match their siblings. >> >>>> >> >>>> Is there anything to recommend the hacks mentioned for fixed sized >> >>>> array >> >>>> objects *other* than using them to fake structs? (Much to >> >>>> derecommend, as >> >>>> you mentioned!) >> >>>> >> >>>> On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett <ekm...@gmail.com> >> >>>> wrote: >> >>>>> >> >>>>> I think both are useful, but the one you suggest requires a lot more >> >>>>> plumbing and doesn't subsume all of the usecases of the other. >> >>>>> >> >>>>> -Edward >> >>>>> >> >>>>> On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton <rrnew...@gmail.com> >> >>>>> wrote: >> >>>>>> >> >>>>>> So that primitive is an array like thing (Same pointed type, >> >>>>>> unbounded >> >>>>>> length) with extra payload. >> >>>>>> >> >>>>>> I can see how we can do without structs if we have arrays, >> >>>>>> especially >> >>>>>> with the extra payload at front. But wouldn't the general solution >> >>>>>> for >> >>>>>> structs be one that that allows new user data type defs for # >> >>>>>> types? >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett <ekm...@gmail.com> >> >>>>>> wrote: >> >>>>>>> >> >>>>>>> Some form of MutableStruct# with a known number of words and a >> >>>>>>> known >> >>>>>>> number of pointers is basically what Ryan Yates was suggesting >> >>>>>>> above, but >> >>>>>>> where the word counts were stored in the objects themselves. >> >>>>>>> >> >>>>>>> Given that it'd have a couple of words for those counts it'd >> >>>>>>> likely >> >>>>>>> want to be something we build in addition to MutVar# rather than a >> >>>>>>> replacement. >> >>>>>>> >> >>>>>>> On the other hand, if we had to fix those numbers and build info >> >>>>>>> tables that knew them, and typechecker support, for instance, it'd >> >>>>>>> get >> >>>>>>> rather invasive. >> >>>>>>> >> >>>>>>> Also, a number of things that we can do with the 'sized' versions >> >>>>>>> above, like working with evil unsized c-style arrays directly >> >>>>>>> inline at the >> >>>>>>> end of the structure cease to be possible, so it isn't even a pure >> >>>>>>> win if we >> >>>>>>> did the engineering effort. >> >>>>>>> >> >>>>>>> I think 90% of the needs I have are covered just by adding the one >> >>>>>>> primitive. The last 10% gets pretty invasive. >> >>>>>>> >> >>>>>>> -Edward >> >>>>>>> >> >>>>>>> On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton <rrnew...@gmail.com> >> >>>>>>> wrote: >> >>>>>>>> >> >>>>>>>> I like the possibility of a general solution for mutable structs >> >>>>>>>> (like Ed said), and I'm trying to fully understand why it's hard. >> >>>>>>>> >> >>>>>>>> So, we can't unpack MutVar into constructors because of object >> >>>>>>>> identity problems. But what about directly supporting an >> >>>>>>>> extensible set of >> >>>>>>>> unlifted MutStruct# objects, generalizing (and even replacing) >> >>>>>>>> MutVar#? That >> >>>>>>>> may be too much work, but is it problematic otherwise? >> >>>>>>>> >> >>>>>>>> Needless to say, this is also critical if we ever want best in >> >>>>>>>> class >> >>>>>>>> lockfree mutable structures, just like their Stm and sequential >> >>>>>>>> counterparts. >> >>>>>>>> >> >>>>>>>> On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones >> >>>>>>>> <simo...@microsoft.com> wrote: >> >>>>>>>>> >> >>>>>>>>> At the very least I'll take this email and turn it into a short >> >>>>>>>>> article. >> >>>>>>>>> >> >>>>>>>>> Yes, please do make it into a wiki page on the GHC Trac, and >> >>>>>>>>> maybe >> >>>>>>>>> make a ticket for it. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Thanks >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Simon >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> From: Edward Kmett [mailto:ekm...@gmail.com] >> >>>>>>>>> Sent: 27 August 2015 16:54 >> >>>>>>>>> To: Simon Peyton Jones >> >>>>>>>>> Cc: Manuel M T Chakravarty; Simon Marlow; ghc-devs >> >>>>>>>>> Subject: Re: ArrayArrays >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> An ArrayArray# is just an Array# with a modified invariant. It >> >>>>>>>>> points directly to other unlifted ArrayArray#'s or ByteArray#'s. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> While those live in #, they are garbage collected objects, so >> >>>>>>>>> this >> >>>>>>>>> all lives on the heap. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> They were added to make some of the DPH stuff fast when it has >> >>>>>>>>> to >> >>>>>>>>> deal with nested arrays. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I'm currently abusing them as a placeholder for a better thing. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> The Problem >> >>>>>>>>> >> >>>>>>>>> ----------------- >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Consider the scenario where you write a classic doubly-linked >> >>>>>>>>> list >> >>>>>>>>> in Haskell. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Chasing from one DLL to the next requires following 3 pointers >> >>>>>>>>> on >> >>>>>>>>> the heap. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> DLL ~> IORef (Maybe DLL) ~> MutVar# RealWorld (Maybe DLL) ~> >> >>>>>>>>> Maybe >> >>>>>>>>> DLL ~> DLL >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> That is 3 levels of indirection. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> We can trim one by simply unpacking the IORef with >> >>>>>>>>> -funbox-strict-fields or UNPACK >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> We can trim another by adding a 'Nil' constructor for DLL and >> >>>>>>>>> worsening our representation. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> but now we're still stuck with a level of indirection >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> DLL ~> MutVar# RealWorld DLL ~> DLL >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> This means that every operation we perform on this structure >> >>>>>>>>> will >> >>>>>>>>> be about half of the speed of an implementation in most other >> >>>>>>>>> languages >> >>>>>>>>> assuming we're memory bound on loading things into cache! >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Making Progress >> >>>>>>>>> >> >>>>>>>>> ---------------------- >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I have been working on a number of data structures where the >> >>>>>>>>> indirection of going from something in * out to an object in # >> >>>>>>>>> which >> >>>>>>>>> contains the real pointer to my target and coming back >> >>>>>>>>> effectively doubles >> >>>>>>>>> my runtime. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> We go out to the MutVar# because we are allowed to put the >> >>>>>>>>> MutVar# >> >>>>>>>>> onto the mutable list when we dirty it. There is a well defined >> >>>>>>>>> write-barrier. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I could change out the representation to use >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLL = DLL (MutableArray# RealWorld DLL) | Nil >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I can just store two pointers in the MutableArray# every time, >> >>>>>>>>> but >> >>>>>>>>> this doesn't help _much_ directly. It has reduced the amount of >> >>>>>>>>> distinct >> >>>>>>>>> addresses in memory I touch on a walk of the DLL from 3 per >> >>>>>>>>> object to 2. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I still have to go out to the heap from my DLL and get to the >> >>>>>>>>> array >> >>>>>>>>> object and then chase it to the next DLL and chase that to the >> >>>>>>>>> next array. I >> >>>>>>>>> do get my two pointers together in memory though. I'm paying for >> >>>>>>>>> a card >> >>>>>>>>> marking table as well, which I don't particularly need with just >> >>>>>>>>> two >> >>>>>>>>> pointers, but we can shed that with the "SmallMutableArray#" >> >>>>>>>>> machinery added >> >>>>>>>>> back in 7.10, which is just the old array code a a new data >> >>>>>>>>> type, which can >> >>>>>>>>> speed things up a bit when you don't have very big arrays: >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> But what if I wanted my object itself to live in # and have two >> >>>>>>>>> mutable fields and be able to share the sme write barrier? >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> An ArrayArray# points directly to other unlifted array types. >> >>>>>>>>> What >> >>>>>>>>> if we have one # -> * wrapper on the outside to deal with the >> >>>>>>>>> impedence >> >>>>>>>>> mismatch between the imperative world and Haskell, and then just >> >>>>>>>>> let the >> >>>>>>>>> ArrayArray#'s hold other arrayarrays. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLL = DLL (MutableArrayArray# RealWorld) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> now I need to make up a new Nil, which I can just make be a >> >>>>>>>>> special >> >>>>>>>>> MutableArrayArray# I allocate on program startup. I can even >> >>>>>>>>> abuse pattern >> >>>>>>>>> synonyms. Alternately I can exploit the internals further to >> >>>>>>>>> make this >> >>>>>>>>> cheaper. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Then I can use the readMutableArrayArray# and >> >>>>>>>>> writeMutableArrayArray# calls to directly access the preceding >> >>>>>>>>> and next >> >>>>>>>>> entry in the linked list. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> So now we have one DLL wrapper which just 'bootstraps me' into a >> >>>>>>>>> strict world, and everything there lives in #. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> next :: DLL -> IO DLL >> >>>>>>>>> >> >>>>>>>>> next (DLL m) = IO $ \s -> case readMutableArrayArray# s of >> >>>>>>>>> >> >>>>>>>>> (# s', n #) -> (# s', DLL n #) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> It turns out GHC is quite happy to optimize all of that code to >> >>>>>>>>> keep things unboxed. The 'DLL' wrappers get removed pretty >> >>>>>>>>> easily when they >> >>>>>>>>> are known strict and you chain operations of this sort! >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Cleaning it Up >> >>>>>>>>> >> >>>>>>>>> ------------------ >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Now I have one outermost indirection pointing to an array that >> >>>>>>>>> points directly to other arrays. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I'm stuck paying for a card marking table per object, but I can >> >>>>>>>>> fix >> >>>>>>>>> that by duplicating the code for MutableArrayArray# and using a >> >>>>>>>>> SmallMutableArray#. I can hack up primops that let me store a >> >>>>>>>>> mixture of >> >>>>>>>>> SmallMutableArray# fields and normal ones in the data structure. >> >>>>>>>>> Operationally, I can even do so by just unsafeCoercing the >> >>>>>>>>> existing >> >>>>>>>>> SmallMutableArray# primitives to change the kind of one of the >> >>>>>>>>> arguments it >> >>>>>>>>> takes. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> This is almost ideal, but not quite. I often have fields that >> >>>>>>>>> would >> >>>>>>>>> be best left unboxed. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data DLLInt = DLL !Int !(IORef DLL) !(IORef DLL) | Nil >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> was able to unpack the Int, but we lost that. We can currently >> >>>>>>>>> at >> >>>>>>>>> best point one of the entries of the SmallMutableArray# at a >> >>>>>>>>> boxed or at a >> >>>>>>>>> MutableByteArray# for all of our misc. data and shove the int in >> >>>>>>>>> question in >> >>>>>>>>> there. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> e.g. if I were to implement a hash-array-mapped-trie I need to >> >>>>>>>>> store masks and administrivia as I walk down the tree. Having to >> >>>>>>>>> go off to >> >>>>>>>>> the side costs me the entire win from avoiding the first pointer >> >>>>>>>>> chase. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> But, if like Ryan suggested, we had a heap object we could >> >>>>>>>>> construct that had n words with unsafe access and m pointers to >> >>>>>>>>> other heap >> >>>>>>>>> objects, one that could put itself on the mutable list when any >> >>>>>>>>> of those >> >>>>>>>>> pointers changed then I could shed this last factor of two in >> >>>>>>>>> all >> >>>>>>>>> circumstances. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Prototype >> >>>>>>>>> >> >>>>>>>>> ------------- >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Over the last few days I've put together a small prototype >> >>>>>>>>> implementation with a few non-trivial imperative data structures >> >>>>>>>>> for things >> >>>>>>>>> like Tarjan's link-cut trees, the list labeling problem and >> >>>>>>>>> order-maintenance. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> https://github.com/ekmett/structs >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Notable bits: >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Data.Struct.Internal.LinkCut provides an implementation of >> >>>>>>>>> link-cut >> >>>>>>>>> trees in this style. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Data.Struct.Internal provides the rather horrifying guts that >> >>>>>>>>> make >> >>>>>>>>> it go fast. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Once compiled with -O or -O2, if you look at the core, almost >> >>>>>>>>> all >> >>>>>>>>> the references to the LinkCut or Object data constructor get >> >>>>>>>>> optimized away, >> >>>>>>>>> and we're left with beautiful strict code directly mutating out >> >>>>>>>>> underlying >> >>>>>>>>> representation. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> At the very least I'll take this email and turn it into a short >> >>>>>>>>> article. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> -Edward >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On Thu, Aug 27, 2015 at 9:00 AM, Simon Peyton Jones >> >>>>>>>>> <simo...@microsoft.com> wrote: >> >>>>>>>>> >> >>>>>>>>> Just to say that I have no idea what is going on in this thread. >> >>>>>>>>> What is ArrayArray? What is the issue in general? Is there a >> >>>>>>>>> ticket? Is >> >>>>>>>>> there a wiki page? >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> If it’s important, an ab-initio wiki page + ticket would be a >> >>>>>>>>> good >> >>>>>>>>> thing. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Simon >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf >> >>>>>>>>> Of >> >>>>>>>>> Edward Kmett >> >>>>>>>>> Sent: 21 August 2015 05:25 >> >>>>>>>>> To: Manuel M T Chakravarty >> >>>>>>>>> Cc: Simon Marlow; ghc-devs >> >>>>>>>>> Subject: Re: ArrayArrays >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> When (ab)using them for this purpose, SmallArrayArray's would be >> >>>>>>>>> very handy as well. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Consider right now if I have something like an order-maintenance >> >>>>>>>>> structure I have: >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data Upper s = Upper {-# UNPACK #-} !(MutableByteArray s) {-# >> >>>>>>>>> UNPACK #-} !(MutVar s (Upper s)) {-# UNPACK #-} !(MutVar s >> >>>>>>>>> (Upper s)) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data Lower s = Lower {-# UNPACK #-} !(MutVar s (Upper s)) {-# >> >>>>>>>>> UNPACK #-} !(MutableByteArray s) {-# UNPACK #-} !(MutVar s >> >>>>>>>>> (Lower s)) {-# >> >>>>>>>>> UNPACK #-} !(MutVar s (Lower s)) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> The former contains, logically, a mutable integer and two >> >>>>>>>>> pointers, >> >>>>>>>>> one for forward and one for backwards. The latter is basically >> >>>>>>>>> the same >> >>>>>>>>> thing with a mutable reference up pointing at the structure >> >>>>>>>>> above. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On the heap this is an object that points to a structure for the >> >>>>>>>>> bytearray, and points to another structure for each mutvar which >> >>>>>>>>> each point >> >>>>>>>>> to the other 'Upper' structure. So there is a level of >> >>>>>>>>> indirection smeared >> >>>>>>>>> over everything. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> So this is a pair of doubly linked lists with an upward link >> >>>>>>>>> from >> >>>>>>>>> the structure below to the structure above. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Converted into ArrayArray#s I'd get >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data Upper s = Upper (MutableArrayArray# s) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> w/ the first slot being a pointer to a MutableByteArray#, and >> >>>>>>>>> the >> >>>>>>>>> next 2 slots pointing to the previous and next previous objects, >> >>>>>>>>> represented >> >>>>>>>>> just as their MutableArrayArray#s. I can use >> >>>>>>>>> sameMutableArrayArray# on these >> >>>>>>>>> for object identity, which lets me check for the ends of the >> >>>>>>>>> lists by tying >> >>>>>>>>> things back on themselves. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> and below that >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> data Lower s = Lower (MutableArrayArray# s) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> is similar, with an extra MutableArrayArray slot pointing up to >> >>>>>>>>> an >> >>>>>>>>> upper structure. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I can then write a handful of combinators for getting out the >> >>>>>>>>> slots >> >>>>>>>>> in question, while it has gained a level of indirection between >> >>>>>>>>> the wrapper >> >>>>>>>>> to put it in * and the MutableArrayArray# s in #, that one can >> >>>>>>>>> be basically >> >>>>>>>>> erased by ghc. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Unlike before I don't have several separate objects on the heap >> >>>>>>>>> for >> >>>>>>>>> each thing. I only have 2 now. The MutableArrayArray# for the >> >>>>>>>>> object itself, >> >>>>>>>>> and the MutableByteArray# that it references to carry around the >> >>>>>>>>> mutable >> >>>>>>>>> int. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> The only pain points are >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> 1.) the aforementioned limitation that currently prevents me >> >>>>>>>>> from >> >>>>>>>>> stuffing normal boxed data through a SmallArray or Array into an >> >>>>>>>>> ArrayArray >> >>>>>>>>> leaving me in a little ghetto disconnected from the rest of >> >>>>>>>>> Haskell, >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> and >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> 2.) the lack of SmallArrayArray's, which could let us avoid the >> >>>>>>>>> card marking overhead. These objects are all small, 3-4 pointers >> >>>>>>>>> wide. Card >> >>>>>>>>> marking doesn't help. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Alternately I could just try to do really evil things and >> >>>>>>>>> convert >> >>>>>>>>> the whole mess to SmallArrays and then figure out how to >> >>>>>>>>> unsafeCoerce my way >> >>>>>>>>> to glory, stuffing the #'d references to the other arrays >> >>>>>>>>> directly into the >> >>>>>>>>> SmallArray as slots, removing the limitation we see here by >> >>>>>>>>> aping the >> >>>>>>>>> MutableArrayArray# s API, but that gets really really dangerous! >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> I'm pretty much willing to sacrifice almost anything on the >> >>>>>>>>> altar >> >>>>>>>>> of speed here, but I'd like to be able to let the GC move them >> >>>>>>>>> and collect >> >>>>>>>>> them which rules out simpler Ptr and Addr based solutions. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> -Edward >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On Thu, Aug 20, 2015 at 9:01 PM, Manuel M T Chakravarty >> >>>>>>>>> <c...@cse.unsw.edu.au> wrote: >> >>>>>>>>> >> >>>>>>>>> That’s an interesting idea. >> >>>>>>>>> >> >>>>>>>>> Manuel >> >>>>>>>>> >> >>>>>>>>> > Edward Kmett <ekm...@gmail.com>: >> >>>>>>>>> >> >>>>>>>>> > >> >>>>>>>>> > Would it be possible to add unsafe primops to add Array# and >> >>>>>>>>> > SmallArray# entries to an ArrayArray#? The fact that the >> >>>>>>>>> > ArrayArray# entries >> >>>>>>>>> > are all directly unlifted avoiding a level of indirection for >> >>>>>>>>> > the containing >> >>>>>>>>> > structure is amazing, but I can only currently use it if my >> >>>>>>>>> > leaf level data >> >>>>>>>>> > can be 100% unboxed and distributed among ByteArray#s. It'd be >> >>>>>>>>> > nice to be >> >>>>>>>>> > able to have the ability to put SmallArray# a stuff down at >> >>>>>>>>> > the leaves to >> >>>>>>>>> > hold lifted contents. >> >>>>>>>>> > >> >>>>>>>>> > I accept fully that if I name the wrong type when I go to >> >>>>>>>>> > access >> >>>>>>>>> > one of the fields it'll lie to me, but I suppose it'd do that >> >>>>>>>>> > if i tried to >> >>>>>>>>> > use one of the members that held a nested ArrayArray# as a >> >>>>>>>>> > ByteArray# >> >>>>>>>>> > anyways, so it isn't like there is a safety story preventing >> >>>>>>>>> > this. >> >>>>>>>>> > >> >>>>>>>>> > I've been hunting for ways to try to kill the indirection >> >>>>>>>>> > problems I get with Haskell and mutable structures, and I >> >>>>>>>>> > could shoehorn a >> >>>>>>>>> > number of them into ArrayArrays if this worked. >> >>>>>>>>> > >> >>>>>>>>> > Right now I'm stuck paying for 2 or 3 levels of unnecessary >> >>>>>>>>> > indirection compared to c/java and this could reduce that pain >> >>>>>>>>> > to just 1 >> >>>>>>>>> > level of unnecessary indirection. >> >>>>>>>>> > >> >>>>>>>>> > -Edward >> >>>>>>>>> >> >>>>>>>>> > _______________________________________________ >> >>>>>>>>> > ghc-devs mailing list >> >>>>>>>>> > ghc-devs@haskell.org >> >>>>>>>>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> _______________________________________________ >> >>>>>>>>> ghc-devs mailing list >> >>>>>>>>> ghc-devs@haskell.org >> >>>>>>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> >>>>>>> >> >>>>>>> >> >>>>> >> >>> >> >> >> > >> > >> > _______________________________________________ >> > ghc-devs mailing list >> > ghc-devs@haskell.org >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> > > > _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs