I think both are useful, but the one you suggest requires a lot more plumbing and doesn't subsume all of the usecases of the other.
-Edward On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton <rrnew...@gmail.com> wrote: > So that primitive is an array like thing (Same pointed type, unbounded > length) with extra payload. > > I can see how we can do without structs if we have arrays, especially with > the extra payload at front. But wouldn't the general solution for structs > be one that that allows new user data type defs for # types? > > > > On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett <ekm...@gmail.com> wrote: > >> Some form of MutableStruct# with a known number of words and a known >> number of pointers is basically what Ryan Yates was suggesting above, but >> where the word counts were stored in the objects themselves. >> >> Given that it'd have a couple of words for those counts it'd likely want >> to be something we build in addition to MutVar# rather than a replacement. >> >> On the other hand, if we had to fix those numbers and build info tables >> that knew them, and typechecker support, for instance, it'd get rather >> invasive. >> >> Also, a number of things that we can do with the 'sized' versions above, >> like working with evil unsized c-style arrays directly inline at the end of >> the structure cease to be possible, so it isn't even a pure win if we did >> the engineering effort. >> >> I think 90% of the needs I have are covered just by adding the one >> primitive. The last 10% gets pretty invasive. >> >> -Edward >> >> On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton <rrnew...@gmail.com> wrote: >> >>> I like the possibility of a general solution for mutable structs (like >>> Ed said), and I'm trying to fully understand why it's hard. >>> >>> So, we can't unpack MutVar into constructors because of object identity >>> problems. But what about directly supporting an extensible set of unlifted >>> MutStruct# objects, generalizing (and even replacing) MutVar#? That may be >>> too much work, but is it problematic otherwise? >>> >>> Needless to say, this is also critical if we ever want best in class >>> lockfree mutable structures, just like their Stm and sequential >>> counterparts. >>> >>> On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones < >>> simo...@microsoft.com> wrote: >>> >>>> At the very least I'll take this email and turn it into a short article. >>>> >>>> Yes, please do make it into a wiki page on the GHC Trac, and maybe make >>>> a ticket for it. >>>> >>>> >>>> Thanks >>>> >>>> >>>> >>>> Simon >>>> >>>> >>>> >>>> *From:* Edward Kmett [mailto:ekm...@gmail.com] >>>> *Sent:* 27 August 2015 16:54 >>>> *To:* Simon Peyton Jones >>>> *Cc:* Manuel M T Chakravarty; Simon Marlow; ghc-devs >>>> *Subject:* Re: ArrayArrays >>>> >>>> >>>> >>>> An ArrayArray# is just an Array# with a modified invariant. It points >>>> directly to other unlifted ArrayArray#'s or ByteArray#'s. >>>> >>>> >>>> >>>> While those live in #, they are garbage collected objects, so this all >>>> lives on the heap. >>>> >>>> >>>> >>>> They were added to make some of the DPH stuff fast when it has to deal >>>> with nested arrays. >>>> >>>> >>>> >>>> I'm currently abusing them as a placeholder for a better thing. >>>> >>>> >>>> >>>> The Problem >>>> >>>> ----------------- >>>> >>>> >>>> >>>> Consider the scenario where you write a classic doubly-linked list in >>>> Haskell. >>>> >>>> >>>> >>>> data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) >>>> >>>> >>>> >>>> Chasing from one DLL to the next requires following 3 pointers on the >>>> heap. >>>> >>>> >>>> >>>> DLL ~> IORef (Maybe DLL) ~> MutVar# RealWorld (Maybe DLL) ~> Maybe DLL >>>> ~> DLL >>>> >>>> >>>> >>>> That is 3 levels of indirection. >>>> >>>> >>>> >>>> We can trim one by simply unpacking the IORef with >>>> -funbox-strict-fields or UNPACK >>>> >>>> >>>> >>>> We can trim another by adding a 'Nil' constructor for DLL and worsening >>>> our representation. >>>> >>>> >>>> >>>> data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil >>>> >>>> >>>> >>>> but now we're still stuck with a level of indirection >>>> >>>> >>>> >>>> DLL ~> MutVar# RealWorld DLL ~> DLL >>>> >>>> >>>> >>>> This means that every operation we perform on this structure will be >>>> about half of the speed of an implementation in most other languages >>>> assuming we're memory bound on loading things into cache! >>>> >>>> >>>> >>>> Making Progress >>>> >>>> ---------------------- >>>> >>>> >>>> >>>> I have been working on a number of data structures where the >>>> indirection of going from something in * out to an object in # which >>>> contains the real pointer to my target and coming back effectively doubles >>>> my runtime. >>>> >>>> >>>> >>>> We go out to the MutVar# because we are allowed to put the MutVar# onto >>>> the mutable list when we dirty it. There is a well defined write-barrier. >>>> >>>> >>>> >>>> I could change out the representation to use >>>> >>>> >>>> >>>> data DLL = DLL (MutableArray# RealWorld DLL) | Nil >>>> >>>> >>>> >>>> I can just store two pointers in the MutableArray# every time, but this >>>> doesn't help _much_ directly. It has reduced the amount of distinct >>>> addresses in memory I touch on a walk of the DLL from 3 per object to 2. >>>> >>>> >>>> >>>> I still have to go out to the heap from my DLL and get to the array >>>> object and then chase it to the next DLL and chase that to the next array. >>>> I do get my two pointers together in memory though. I'm paying for a card >>>> marking table as well, which I don't particularly need with just two >>>> pointers, but we can shed that with the "SmallMutableArray#" machinery >>>> added back in 7.10, which is just the old array code a a new data type, >>>> which can speed things up a bit when you don't have very big arrays: >>>> >>>> >>>> >>>> data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil >>>> >>>> >>>> >>>> But what if I wanted my object itself to live in # and have two mutable >>>> fields and be able to share the sme write barrier? >>>> >>>> >>>> >>>> An ArrayArray# points directly to other unlifted array types. What if >>>> we have one # -> * wrapper on the outside to deal with the impedence >>>> mismatch between the imperative world and Haskell, and then just let the >>>> ArrayArray#'s hold other arrayarrays. >>>> >>>> >>>> >>>> data DLL = DLL (MutableArrayArray# RealWorld) >>>> >>>> >>>> >>>> now I need to make up a new Nil, which I can just make be a special >>>> MutableArrayArray# I allocate on program startup. I can even abuse pattern >>>> synonyms. Alternately I can exploit the internals further to make this >>>> cheaper. >>>> >>>> >>>> >>>> Then I can use the readMutableArrayArray# and writeMutableArrayArray# >>>> calls to directly access the preceding and next entry in the linked list. >>>> >>>> >>>> >>>> So now we have one DLL wrapper which just 'bootstraps me' into a strict >>>> world, and everything there lives in #. >>>> >>>> >>>> >>>> next :: DLL -> IO DLL >>>> >>>> next (DLL m) = IO $ \s -> case readMutableArrayArray# s of >>>> >>>> (# s', n #) -> (# s', DLL n #) >>>> >>>> >>>> >>>> It turns out GHC is quite happy to optimize all of that code to keep >>>> things unboxed. The 'DLL' wrappers get removed pretty easily when they are >>>> known strict and you chain operations of this sort! >>>> >>>> >>>> >>>> Cleaning it Up >>>> >>>> ------------------ >>>> >>>> >>>> >>>> Now I have one outermost indirection pointing to an array that points >>>> directly to other arrays. >>>> >>>> >>>> >>>> I'm stuck paying for a card marking table per object, but I can fix >>>> that by duplicating the code for MutableArrayArray# and using a >>>> SmallMutableArray#. I can hack up primops that let me store a mixture of >>>> SmallMutableArray# fields and normal ones in the data structure. >>>> Operationally, I can even do so by just unsafeCoercing the existing >>>> SmallMutableArray# primitives to change the kind of one of the arguments it >>>> takes. >>>> >>>> >>>> >>>> This is almost ideal, but not quite. I often have fields that would be >>>> best left unboxed. >>>> >>>> >>>> >>>> data DLLInt = DLL !Int !(IORef DLL) !(IORef DLL) | Nil >>>> >>>> >>>> >>>> was able to unpack the Int, but we lost that. We can currently at best >>>> point one of the entries of the SmallMutableArray# at a boxed or at a >>>> MutableByteArray# for all of our misc. data and shove the int in question >>>> in there. >>>> >>>> >>>> >>>> e.g. if I were to implement a hash-array-mapped-trie I need to store >>>> masks and administrivia as I walk down the tree. Having to go off to the >>>> side costs me the entire win from avoiding the first pointer chase. >>>> >>>> >>>> >>>> But, if like Ryan suggested, we had a heap object we could construct >>>> that had n words with unsafe access and m pointers to other heap objects, >>>> one that could put itself on the mutable list when any of those pointers >>>> changed then I could shed this last factor of two in all circumstances. >>>> >>>> >>>> >>>> Prototype >>>> >>>> ------------- >>>> >>>> >>>> >>>> Over the last few days I've put together a small prototype >>>> implementation with a few non-trivial imperative data structures for things >>>> like Tarjan's link-cut trees, the list labeling problem and >>>> order-maintenance. >>>> >>>> >>>> >>>> https://github.com/ekmett/structs >>>> >>>> >>>> >>>> Notable bits: >>>> >>>> >>>> >>>> Data.Struct.Internal.LinkCut >>>> <https://github.com/ekmett/structs/blob/9ff2818f888aff4789b7a41077a674a10d15e6ee/src/Data/Struct/Internal/LinkCut.hs> >>>> provides an implementation of link-cut trees in this style. >>>> >>>> >>>> >>>> Data.Struct.Internal >>>> <https://github.com/ekmett/structs/blob/9ff2818f888aff4789b7a41077a674a10d15e6ee/src/Data/Struct/Internal.hs> >>>> provides the rather horrifying guts that make it go fast. >>>> >>>> >>>> >>>> Once compiled with -O or -O2, if you look at the core, almost all the >>>> references to the LinkCut or Object data constructor get optimized away, >>>> and we're left with beautiful strict code directly mutating out underlying >>>> representation. >>>> >>>> >>>> >>>> At the very least I'll take this email and turn it into a short article. >>>> >>>> >>>> >>>> -Edward >>>> >>>> >>>> >>>> On Thu, Aug 27, 2015 at 9:00 AM, Simon Peyton Jones < >>>> simo...@microsoft.com> wrote: >>>> >>>> Just to say that I have no idea what is going on in this thread. What >>>> is ArrayArray? What is the issue in general? Is there a ticket? Is there >>>> a wiki page? >>>> >>>> >>>> >>>> If it’s important, an ab-initio wiki page + ticket would be a good >>>> thing. >>>> >>>> >>>> >>>> Simon >>>> >>>> >>>> >>>> *From:* ghc-devs [mailto:ghc-devs-boun...@haskell.org] *On Behalf Of >>>> *Edward >>>> Kmett >>>> *Sent:* 21 August 2015 05:25 >>>> *To:* Manuel M T Chakravarty >>>> *Cc:* Simon Marlow; ghc-devs >>>> *Subject:* Re: ArrayArrays >>>> >>>> >>>> >>>> When (ab)using them for this purpose, SmallArrayArray's would be very >>>> handy as well. >>>> >>>> >>>> >>>> Consider right now if I have something like an order-maintenance >>>> structure I have: >>>> >>>> >>>> >>>> data Upper s = Upper {-# UNPACK #-} !(MutableByteArray s) {-# UNPACK >>>> #-} !(MutVar s (Upper s)) {-# UNPACK #-} !(MutVar s (Upper s)) >>>> >>>> >>>> >>>> data Lower s = Lower {-# UNPACK #-} !(MutVar s (Upper s)) {-# UNPACK >>>> #-} !(MutableByteArray s) {-# UNPACK #-} !(MutVar s (Lower s)) {-# UNPACK >>>> #-} !(MutVar s (Lower s)) >>>> >>>> >>>> >>>> The former contains, logically, a mutable integer and two pointers, one >>>> for forward and one for backwards. The latter is basically the same thing >>>> with a mutable reference up pointing at the structure above. >>>> >>>> >>>> >>>> On the heap this is an object that points to a structure for the >>>> bytearray, and points to another structure for each mutvar which each point >>>> to the other 'Upper' structure. So there is a level of indirection smeared >>>> over everything. >>>> >>>> >>>> >>>> So this is a pair of doubly linked lists with an upward link from the >>>> structure below to the structure above. >>>> >>>> >>>> >>>> Converted into ArrayArray#s I'd get >>>> >>>> >>>> >>>> data Upper s = Upper (MutableArrayArray# s) >>>> >>>> >>>> >>>> w/ the first slot being a pointer to a MutableByteArray#, and the next >>>> 2 slots pointing to the previous and next previous objects, represented >>>> just as their MutableArrayArray#s. I can use sameMutableArrayArray# on >>>> these for object identity, which lets me check for the ends of the lists by >>>> tying things back on themselves. >>>> >>>> >>>> >>>> and below that >>>> >>>> >>>> >>>> data Lower s = Lower (MutableArrayArray# s) >>>> >>>> >>>> >>>> is similar, with an extra MutableArrayArray slot pointing up to an >>>> upper structure. >>>> >>>> >>>> >>>> I can then write a handful of combinators for getting out the slots in >>>> question, while it has gained a level of indirection between the wrapper to >>>> put it in * and the MutableArrayArray# s in #, that one can be basically >>>> erased by ghc. >>>> >>>> >>>> >>>> Unlike before I don't have several separate objects on the heap for >>>> each thing. I only have 2 now. The MutableArrayArray# for the object >>>> itself, and the MutableByteArray# that it references to carry around the >>>> mutable int. >>>> >>>> >>>> >>>> The only pain points are >>>> >>>> >>>> >>>> 1.) the aforementioned limitation that currently prevents me from >>>> stuffing normal boxed data through a SmallArray or Array into an ArrayArray >>>> leaving me in a little ghetto disconnected from the rest of Haskell, >>>> >>>> >>>> >>>> and >>>> >>>> >>>> >>>> 2.) the lack of SmallArrayArray's, which could let us avoid the card >>>> marking overhead. These objects are all small, 3-4 pointers wide. Card >>>> marking doesn't help. >>>> >>>> >>>> >>>> Alternately I could just try to do really evil things and convert the >>>> whole mess to SmallArrays and then figure out how to unsafeCoerce my way to >>>> glory, stuffing the #'d references to the other arrays directly into the >>>> SmallArray as slots, removing the limitation we see here by aping the >>>> MutableArrayArray# s API, but that gets really really dangerous! >>>> >>>> >>>> >>>> I'm pretty much willing to sacrifice almost anything on the altar of >>>> speed here, but I'd like to be able to let the GC move them and collect >>>> them which rules out simpler Ptr and Addr based solutions. >>>> >>>> >>>> >>>> -Edward >>>> >>>> >>>> >>>> On Thu, Aug 20, 2015 at 9:01 PM, Manuel M T Chakravarty < >>>> c...@cse.unsw.edu.au> wrote: >>>> >>>> That’s an interesting idea. >>>> >>>> Manuel >>>> >>>> > Edward Kmett <ekm...@gmail.com>: >>>> >>>> > >>>> > Would it be possible to add unsafe primops to add Array# and >>>> SmallArray# entries to an ArrayArray#? The fact that the ArrayArray# >>>> entries are all directly unlifted avoiding a level of indirection for the >>>> containing structure is amazing, but I can only currently use it if my leaf >>>> level data can be 100% unboxed and distributed among ByteArray#s. It'd be >>>> nice to be able to have the ability to put SmallArray# a stuff down at the >>>> leaves to hold lifted contents. >>>> > >>>> > I accept fully that if I name the wrong type when I go to access one >>>> of the fields it'll lie to me, but I suppose it'd do that if i tried to use >>>> one of the members that held a nested ArrayArray# as a ByteArray# anyways, >>>> so it isn't like there is a safety story preventing this. >>>> > >>>> > I've been hunting for ways to try to kill the indirection problems I >>>> get with Haskell and mutable structures, and I could shoehorn a number of >>>> them into ArrayArrays if this worked. >>>> > >>>> > Right now I'm stuck paying for 2 or 3 levels of unnecessary >>>> indirection compared to c/java and this could reduce that pain to just 1 >>>> level of unnecessary indirection. >>>> > >>>> > -Edward >>>> >>>> > _______________________________________________ >>>> > ghc-devs mailing list >>>> > ghc-devs@haskell.org >>>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> ghc-devs mailing list >>>> ghc-devs@haskell.org >>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >>>> >>> >>
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs