They just segfault at this level. ;) Sent from my iPhone
> On Aug 28, 2015, at 7:25 PM, Ryan Newton <[email protected]> wrote: > > You presumably also save a bounds check on reads by hard-coding the sizes? > >> On Fri, Aug 28, 2015 at 3:39 PM, Edward Kmett <[email protected]> wrote: >> Also there are 4 different "things" here, basically depending on two >> independent questions: >> >> a.) if you want to shove the sizes into the info table, and >> b.) if you want cardmarking. >> >> Versions with/without cardmarking for different sizes can be done pretty >> easily, but as noted, the infotable variants are pretty invasive. >> >> -Edward >> >>> On Fri, Aug 28, 2015 at 6:36 PM, Edward Kmett <[email protected]> wrote: >>> Well, on the plus side you'd save 16 bytes per object, which adds up if >>> they were small enough and there are enough of them. You get a bit better >>> locality of reference in terms of what fits in the first cache line of them. >>> >>> -Edward >>> >>>> On Fri, Aug 28, 2015 at 6:14 PM, Ryan Newton <[email protected]> wrote: >>>> Yes. And for the short term I can imagine places we will settle with >>>> arrays even if it means tracking lengths unnecessarily and unsafeCoercing >>>> pointers whose types don't actually match their siblings. >>>> >>>> Is there anything to recommend the hacks mentioned for fixed sized array >>>> objects *other* than using them to fake structs? (Much to derecommend, as >>>> you mentioned!) >>>> >>>>> On Fri, Aug 28, 2015 at 3:07 PM Edward Kmett <[email protected]> wrote: >>>>> I think both are useful, but the one you suggest requires a lot more >>>>> plumbing and doesn't subsume all of the usecases of the other. >>>>> >>>>> -Edward >>>>> >>>>>> On Fri, Aug 28, 2015 at 5:51 PM, Ryan Newton <[email protected]> wrote: >>>>>> So that primitive is an array like thing (Same pointed type, unbounded >>>>>> length) with extra payload. >>>>>> >>>>>> I can see how we can do without structs if we have arrays, especially >>>>>> with the extra payload at front. But wouldn't the general solution for >>>>>> structs be one that that allows new user data type defs for # types? >>>>>> >>>>>> >>>>>> >>>>>>> On Fri, Aug 28, 2015 at 4:43 PM Edward Kmett <[email protected]> wrote: >>>>>>> Some form of MutableStruct# with a known number of words and a known >>>>>>> number of pointers is basically what Ryan Yates was suggesting above, >>>>>>> but where the word counts were stored in the objects themselves. >>>>>>> >>>>>>> Given that it'd have a couple of words for those counts it'd likely >>>>>>> want to be something we build in addition to MutVar# rather than a >>>>>>> replacement. >>>>>>> >>>>>>> On the other hand, if we had to fix those numbers and build info tables >>>>>>> that knew them, and typechecker support, for instance, it'd get rather >>>>>>> invasive. >>>>>>> >>>>>>> Also, a number of things that we can do with the 'sized' versions >>>>>>> above, like working with evil unsized c-style arrays directly inline at >>>>>>> the end of the structure cease to be possible, so it isn't even a pure >>>>>>> win if we did the engineering effort. >>>>>>> >>>>>>> I think 90% of the needs I have are covered just by adding the one >>>>>>> primitive. The last 10% gets pretty invasive. >>>>>>> >>>>>>> -Edward >>>>>>> >>>>>>>> On Fri, Aug 28, 2015 at 5:30 PM, Ryan Newton <[email protected]> >>>>>>>> wrote: >>>>>>>> I like the possibility of a general solution for mutable structs (like >>>>>>>> Ed said), and I'm trying to fully understand why it's hard. >>>>>>>> >>>>>>>> So, we can't unpack MutVar into constructors because of object >>>>>>>> identity problems. But what about directly supporting an extensible >>>>>>>> set of unlifted MutStruct# objects, generalizing (and even replacing) >>>>>>>> MutVar#? That may be too much work, but is it problematic otherwise? >>>>>>>> >>>>>>>> Needless to say, this is also critical if we ever want best in class >>>>>>>> lockfree mutable structures, just like their Stm and sequential >>>>>>>> counterparts. >>>>>>>> >>>>>>>>> On Fri, Aug 28, 2015 at 4:43 AM Simon Peyton Jones >>>>>>>>> <[email protected]> wrote: >>>>>>>>> At the very least I'll take this email and turn it into a short >>>>>>>>> article. >>>>>>>>> Yes, please do make it into a wiki page on the GHC Trac, and maybe >>>>>>>>> make a ticket for it. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Simon >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> From: Edward Kmett [mailto:[email protected]] >>>>>>>>> Sent: 27 August 2015 16:54 >>>>>>>>> To: Simon Peyton Jones >>>>>>>>> Cc: Manuel M T Chakravarty; Simon Marlow; ghc-devs >>>>>>>>> Subject: Re: ArrayArrays >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> An ArrayArray# is just an Array# with a modified invariant. It points >>>>>>>>> directly to other unlifted ArrayArray#'s or ByteArray#'s. >>>>>>>>> >>>>>>>>> While those live in #, they are garbage collected objects, so this >>>>>>>>> all lives on the heap. >>>>>>>>> >>>>>>>>> They were added to make some of the DPH stuff fast when it has to >>>>>>>>> deal with nested arrays. >>>>>>>>> >>>>>>>>> I'm currently abusing them as a placeholder for a better thing. >>>>>>>>> >>>>>>>>> The Problem >>>>>>>>> ----------------- >>>>>>>>> >>>>>>>>> Consider the scenario where you write a classic doubly-linked list in >>>>>>>>> Haskell. >>>>>>>>> >>>>>>>>> data DLL = DLL (IORef (Maybe DLL) (IORef (Maybe DLL) >>>>>>>>> >>>>>>>>> Chasing from one DLL to the next requires following 3 pointers on the >>>>>>>>> heap. >>>>>>>>> >>>>>>>>> DLL ~> IORef (Maybe DLL) ~> MutVar# RealWorld (Maybe DLL) ~> Maybe >>>>>>>>> DLL ~> DLL >>>>>>>>> >>>>>>>>> That is 3 levels of indirection. >>>>>>>>> >>>>>>>>> We can trim one by simply unpacking the IORef with >>>>>>>>> -funbox-strict-fields or UNPACK >>>>>>>>> >>>>>>>>> We can trim another by adding a 'Nil' constructor for DLL and >>>>>>>>> worsening our representation. >>>>>>>>> >>>>>>>>> data DLL = DLL !(IORef DLL) !(IORef DLL) | Nil >>>>>>>>> >>>>>>>>> but now we're still stuck with a level of indirection >>>>>>>>> >>>>>>>>> DLL ~> MutVar# RealWorld DLL ~> DLL >>>>>>>>> >>>>>>>>> This means that every operation we perform on this structure will be >>>>>>>>> about half of the speed of an implementation in most other languages >>>>>>>>> assuming we're memory bound on loading things into cache! >>>>>>>>> >>>>>>>>> Making Progress >>>>>>>>> ---------------------- >>>>>>>>> >>>>>>>>> I have been working on a number of data structures where the >>>>>>>>> indirection of going from something in * out to an object in # which >>>>>>>>> contains the real pointer to my target and coming back effectively >>>>>>>>> doubles my runtime. >>>>>>>>> >>>>>>>>> We go out to the MutVar# because we are allowed to put the MutVar# >>>>>>>>> onto the mutable list when we dirty it. There is a well defined >>>>>>>>> write-barrier. >>>>>>>>> >>>>>>>>> I could change out the representation to use >>>>>>>>> >>>>>>>>> data DLL = DLL (MutableArray# RealWorld DLL) | Nil >>>>>>>>> >>>>>>>>> I can just store two pointers in the MutableArray# every time, but >>>>>>>>> this doesn't help _much_ directly. It has reduced the amount of >>>>>>>>> distinct addresses in memory I touch on a walk of the DLL from 3 per >>>>>>>>> object to 2. >>>>>>>>> >>>>>>>>> I still have to go out to the heap from my DLL and get to the array >>>>>>>>> object and then chase it to the next DLL and chase that to the next >>>>>>>>> array. I do get my two pointers together in memory though. I'm paying >>>>>>>>> for a card marking table as well, which I don't particularly need >>>>>>>>> with just two pointers, but we can shed that with the >>>>>>>>> "SmallMutableArray#" machinery added back in 7.10, which is just the >>>>>>>>> old array code a a new data type, which can speed things up a bit >>>>>>>>> when you don't have very big arrays: >>>>>>>>> >>>>>>>>> data DLL = DLL (SmallMutableArray# RealWorld DLL) | Nil >>>>>>>>> >>>>>>>>> But what if I wanted my object itself to live in # and have two >>>>>>>>> mutable fields and be able to share the sme write barrier? >>>>>>>>> >>>>>>>>> An ArrayArray# points directly to other unlifted array types. What if >>>>>>>>> we have one # -> * wrapper on the outside to deal with the impedence >>>>>>>>> mismatch between the imperative world and Haskell, and then just let >>>>>>>>> the ArrayArray#'s hold other arrayarrays. >>>>>>>>> >>>>>>>>> data DLL = DLL (MutableArrayArray# RealWorld) >>>>>>>>> >>>>>>>>> now I need to make up a new Nil, which I can just make be a special >>>>>>>>> MutableArrayArray# I allocate on program startup. I can even abuse >>>>>>>>> pattern synonyms. Alternately I can exploit the internals further to >>>>>>>>> make this cheaper. >>>>>>>>> >>>>>>>>> Then I can use the readMutableArrayArray# and writeMutableArrayArray# >>>>>>>>> calls to directly access the preceding and next entry in the linked >>>>>>>>> list. >>>>>>>>> >>>>>>>>> So now we have one DLL wrapper which just 'bootstraps me' into a >>>>>>>>> strict world, and everything there lives in #. >>>>>>>>> >>>>>>>>> next :: DLL -> IO DLL >>>>>>>>> next (DLL m) = IO $ \s -> case readMutableArrayArray# s of >>>>>>>>> (# s', n #) -> (# s', DLL n #) >>>>>>>>> >>>>>>>>> It turns out GHC is quite happy to optimize all of that code to keep >>>>>>>>> things unboxed. The 'DLL' wrappers get removed pretty easily when >>>>>>>>> they are known strict and you chain operations of this sort! >>>>>>>>> >>>>>>>>> Cleaning it Up >>>>>>>>> ------------------ >>>>>>>>> >>>>>>>>> Now I have one outermost indirection pointing to an array that points >>>>>>>>> directly to other arrays. >>>>>>>>> >>>>>>>>> I'm stuck paying for a card marking table per object, but I can fix >>>>>>>>> that by duplicating the code for MutableArrayArray# and using a >>>>>>>>> SmallMutableArray#. I can hack up primops that let me store a mixture >>>>>>>>> of SmallMutableArray# fields and normal ones in the data structure. >>>>>>>>> Operationally, I can even do so by just unsafeCoercing the existing >>>>>>>>> SmallMutableArray# primitives to change the kind of one of the >>>>>>>>> arguments it takes. >>>>>>>>> >>>>>>>>> This is almost ideal, but not quite. I often have fields that would >>>>>>>>> be best left unboxed. >>>>>>>>> >>>>>>>>> data DLLInt = DLL !Int !(IORef DLL) !(IORef DLL) | Nil >>>>>>>>> >>>>>>>>> was able to unpack the Int, but we lost that. We can currently at >>>>>>>>> best point one of the entries of the SmallMutableArray# at a boxed or >>>>>>>>> at a MutableByteArray# for all of our misc. data and shove the int in >>>>>>>>> question in there. >>>>>>>>> >>>>>>>>> e.g. if I were to implement a hash-array-mapped-trie I need to store >>>>>>>>> masks and administrivia as I walk down the tree. Having to go off to >>>>>>>>> the side costs me the entire win from avoiding the first pointer >>>>>>>>> chase. >>>>>>>>> >>>>>>>>> But, if like Ryan suggested, we had a heap object we could construct >>>>>>>>> that had n words with unsafe access and m pointers to other heap >>>>>>>>> objects, one that could put itself on the mutable list when any of >>>>>>>>> those pointers changed then I could shed this last factor of two in >>>>>>>>> all circumstances. >>>>>>>>> >>>>>>>>> Prototype >>>>>>>>> ------------- >>>>>>>>> >>>>>>>>> Over the last few days I've put together a small prototype >>>>>>>>> implementation with a few non-trivial imperative data structures for >>>>>>>>> things like Tarjan's link-cut trees, the list labeling problem and >>>>>>>>> order-maintenance. >>>>>>>>> >>>>>>>>> https://github.com/ekmett/structs >>>>>>>>> >>>>>>>>> Notable bits: >>>>>>>>> >>>>>>>>> Data.Struct.Internal.LinkCut provides an implementation of link-cut >>>>>>>>> trees in this style. >>>>>>>>> >>>>>>>>> Data.Struct.Internal provides the rather horrifying guts that make it >>>>>>>>> go fast. >>>>>>>>> >>>>>>>>> Once compiled with -O or -O2, if you look at the core, almost all the >>>>>>>>> references to the LinkCut or Object data constructor get optimized >>>>>>>>> away, and we're left with beautiful strict code directly mutating out >>>>>>>>> underlying representation. >>>>>>>>> >>>>>>>>> At the very least I'll take this email and turn it into a short >>>>>>>>> article. >>>>>>>>> >>>>>>>>> -Edward >>>>>>>>> >>>>>>>>> On Thu, Aug 27, 2015 at 9:00 AM, Simon Peyton Jones >>>>>>>>> <[email protected]> wrote: >>>>>>>>> Just to say that I have no idea what is going on in this thread. >>>>>>>>> What is ArrayArray? What is the issue in general? Is there a >>>>>>>>> ticket? Is there a wiki page? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> If it’s important, an ab-initio wiki page + ticket would be a good >>>>>>>>> thing. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Simon >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> From: ghc-devs [mailto:[email protected]] On Behalf Of >>>>>>>>> Edward Kmett >>>>>>>>> Sent: 21 August 2015 05:25 >>>>>>>>> To: Manuel M T Chakravarty >>>>>>>>> Cc: Simon Marlow; ghc-devs >>>>>>>>> Subject: Re: ArrayArrays >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> When (ab)using them for this purpose, SmallArrayArray's would be very >>>>>>>>> handy as well. >>>>>>>>> >>>>>>>>> Consider right now if I have something like an order-maintenance >>>>>>>>> structure I have: >>>>>>>>> >>>>>>>>> data Upper s = Upper {-# UNPACK #-} !(MutableByteArray s) {-# UNPACK >>>>>>>>> #-} !(MutVar s (Upper s)) {-# UNPACK #-} !(MutVar s (Upper s)) >>>>>>>>> >>>>>>>>> data Lower s = Lower {-# UNPACK #-} !(MutVar s (Upper s)) {-# UNPACK >>>>>>>>> #-} !(MutableByteArray s) {-# UNPACK #-} !(MutVar s (Lower s)) {-# >>>>>>>>> UNPACK #-} !(MutVar s (Lower s)) >>>>>>>>> >>>>>>>>> The former contains, logically, a mutable integer and two pointers, >>>>>>>>> one for forward and one for backwards. The latter is basically the >>>>>>>>> same thing with a mutable reference up pointing at the structure >>>>>>>>> above. >>>>>>>>> >>>>>>>>> On the heap this is an object that points to a structure for the >>>>>>>>> bytearray, and points to another structure for each mutvar which each >>>>>>>>> point to the other 'Upper' structure. So there is a level of >>>>>>>>> indirection smeared over everything. >>>>>>>>> >>>>>>>>> So this is a pair of doubly linked lists with an upward link from the >>>>>>>>> structure below to the structure above. >>>>>>>>> >>>>>>>>> Converted into ArrayArray#s I'd get >>>>>>>>> >>>>>>>>> data Upper s = Upper (MutableArrayArray# s) >>>>>>>>> >>>>>>>>> w/ the first slot being a pointer to a MutableByteArray#, and the >>>>>>>>> next 2 slots pointing to the previous and next previous objects, >>>>>>>>> represented just as their MutableArrayArray#s. I can use >>>>>>>>> sameMutableArrayArray# on these for object identity, which lets me >>>>>>>>> check for the ends of the lists by tying things back on themselves. >>>>>>>>> >>>>>>>>> and below that >>>>>>>>> >>>>>>>>> data Lower s = Lower (MutableArrayArray# s) >>>>>>>>> >>>>>>>>> is similar, with an extra MutableArrayArray slot pointing up to an >>>>>>>>> upper structure. >>>>>>>>> >>>>>>>>> I can then write a handful of combinators for getting out the slots >>>>>>>>> in question, while it has gained a level of indirection between the >>>>>>>>> wrapper to put it in * and the MutableArrayArray# s in #, that one >>>>>>>>> can be basically erased by ghc. >>>>>>>>> >>>>>>>>> Unlike before I don't have several separate objects on the heap for >>>>>>>>> each thing. I only have 2 now. The MutableArrayArray# for the object >>>>>>>>> itself, and the MutableByteArray# that it references to carry around >>>>>>>>> the mutable int. >>>>>>>>> >>>>>>>>> The only pain points are >>>>>>>>> >>>>>>>>> 1.) the aforementioned limitation that currently prevents me from >>>>>>>>> stuffing normal boxed data through a SmallArray or Array into an >>>>>>>>> ArrayArray leaving me in a little ghetto disconnected from the rest >>>>>>>>> of Haskell, >>>>>>>>> >>>>>>>>> and >>>>>>>>> >>>>>>>>> 2.) the lack of SmallArrayArray's, which could let us avoid the card >>>>>>>>> marking overhead. These objects are all small, 3-4 pointers wide. >>>>>>>>> Card marking doesn't help. >>>>>>>>> >>>>>>>>> Alternately I could just try to do really evil things and convert the >>>>>>>>> whole mess to SmallArrays and then figure out how to unsafeCoerce my >>>>>>>>> way to glory, stuffing the #'d references to the other arrays >>>>>>>>> directly into the SmallArray as slots, removing the limitation we >>>>>>>>> see here by aping the MutableArrayArray# s API, but that gets really >>>>>>>>> really dangerous! >>>>>>>>> >>>>>>>>> I'm pretty much willing to sacrifice almost anything on the altar of >>>>>>>>> speed here, but I'd like to be able to let the GC move them and >>>>>>>>> collect them which rules out simpler Ptr and Addr based solutions. >>>>>>>>> >>>>>>>>> -Edward >>>>>>>>> >>>>>>>>> On Thu, Aug 20, 2015 at 9:01 PM, Manuel M T Chakravarty >>>>>>>>> <[email protected]> wrote: >>>>>>>>> That’s an interesting idea. >>>>>>>>> >>>>>>>>> Manuel >>>>>>>>> >>>>>>>>> > Edward Kmett <[email protected]>: >>>>>>>>> > >>>>>>>>> > Would it be possible to add unsafe primops to add Array# and >>>>>>>>> > SmallArray# entries to an ArrayArray#? The fact that the >>>>>>>>> > ArrayArray# entries are all directly unlifted avoiding a level of >>>>>>>>> > indirection for the containing structure is amazing, but I can only >>>>>>>>> > currently use it if my leaf level data can be 100% unboxed and >>>>>>>>> > distributed among ByteArray#s. It'd be nice to be able to have the >>>>>>>>> > ability to put SmallArray# a stuff down at the leaves to hold >>>>>>>>> > lifted contents. >>>>>>>>> > >>>>>>>>> > I accept fully that if I name the wrong type when I go to access >>>>>>>>> > one of the fields it'll lie to me, but I suppose it'd do that if i >>>>>>>>> > tried to use one of the members that held a nested ArrayArray# as a >>>>>>>>> > ByteArray# anyways, so it isn't like there is a safety story >>>>>>>>> > preventing this. >>>>>>>>> > >>>>>>>>> > I've been hunting for ways to try to kill the indirection problems >>>>>>>>> > I get with Haskell and mutable structures, and I could shoehorn a >>>>>>>>> > number of them into ArrayArrays if this worked. >>>>>>>>> > >>>>>>>>> > Right now I'm stuck paying for 2 or 3 levels of unnecessary >>>>>>>>> > indirection compared to c/java and this could reduce that pain to >>>>>>>>> > just 1 level of unnecessary indirection. >>>>>>>>> > >>>>>>>>> > -Edward >>>>>>>>> > _______________________________________________ >>>>>>>>> > ghc-devs mailing list >>>>>>>>> > [email protected] >>>>>>>>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> ghc-devs mailing list >>>>>>>>> [email protected] >>>>>>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >
_______________________________________________ ghc-devs mailing list [email protected] http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
