Upon consideration from a package management perspective this is probably easiest done by building a new small package to provide the functionality you want. That way we don't haphazardly change the transitive dependencies of a big chunk of the ecosystem and it can rest atop the various containers libraries. This also gives you a lot of opportunity to iterate on the API in public without incurring the instant rigidity of the Haskell Platform.
On Sun, Sep 29, 2013 at 11:06 PM, Ryan Newton <rrnew...@gmail.com> wrote: > Thanks Edward. Good point about Brent's 'split' package. That would be a > really nice place to put the class. But it doesn't currently depend on > containers or vector so I suppose the other instances would need to go > somewhere else. (Assuming containers only exported monomorphic versions.) > > Maybe a next step would be proposing some monomorphic variants for the > containers package. > > I think the complicated bit will be describing how "best-efforty" > splitting variants are: > > - Is it guaranteed O(1) time and allocation? > - Is the provided Int an upper bound? Lower(ish) bound? Or just a > hint? > > With some data structures, there will be a trade-off between partition > imbalance and the work required to achieve balance. But with some data > structures it is happily not a problem (e.g. Vector)! > > But whether there's one variant or a few, I'd be happy either way, as long > as I get at least the cheap one (i.e. prefer imbalance to restructuring). > > -Ryan > > > > > On Sun, Sep 29, 2013 at 8:20 AM, Edward Kmett <ekm...@gmail.com> wrote: > >> I don't know that it belongs in the "standard" libraries, but there could >> definitely be a package for something similar. >> >> ConstraintKinds are a pretty hefty extension to throw at it, and the >> signature written there prevents it from being used on ByteString, Text, >> etc. >> >> This can be implemented with much lighter weight types though! >> >> >> class Partitionable t where >> >> >> partition :: Int -> t -> [t] >> >> >> >> Now ByteString, Text etc. can be instances and no real flexibility is >> lost, as with the class associated constraint on the argument, you'd >> already given up polymorphic recursion. >> >> There still remain issues. partition is already established as the >> filterthat returns both the matching and unmatching elements, so the name is >> wrong. >> >> This is a generalization of Data.List.splitEvery, perhaps it is worth >> seeing how many others can be generalized similarly and talk to Brent about >> adding, say, a Data.Split module to his split package in the platform? >> >> -Edward >> >> >> >> >> >> On Sun, Sep 29, 2013 at 4:21 AM, Ryan Newton <rrnew...@gmail.com> wrote: >> >>> <subject change> >>> >>> On Sun, Sep 29, 2013 at 3:31 AM, Mike Izbicki <m...@izbicki.me> wrote: >>> >>>> I've got a Partitionable class that I've been using for this purpose: >>>> >>>> https://github.com/mikeizbicki/ConstraintKinds/blob/master/src/Control/ConstraintKinds/Partitionable.hs >>>> >>> >>> Mike -- Neat, that's a cool library! >>> >>> Edward -- ideally, where in the standard libraries should the >>> Partitionable comonoid go? >>> >>> Btw, I'm not sure what the ideal return type for comappend is, given >>> that it needs to be able to "bottom out". Mike, our partition function's >>> list return type seems more reasonable. Or maybe something simple would be >>> this: >>> >>> *class Partitionable t where* >>> * partition :: t -> Maybe (t,t)* >>> >>> That is, at some point its not worth splitting and returns Nothing, and >>> you'd better be able to deal with the 't' directly. >>> >>> So what I really want is for the *containers package to please get some >>> kind of Partitionable instances! * Johan & others, I would be happy to >>> provide a patch if the class can be agreed on. This is important because >>> currently the balanced tree structure of Data.Set/Map is an *amazing >>> and beneficial property* that is *not* exposed at all through the API. >>> For example, it would be great to have a parallel traverse_ for Maps >>> and Sets in the Par monad. The particular impetus is that our new and >>> enhanced Par monad makes extensive use of Maps and Sets, both the pure, >>> balanced ones, and lockfree/inplace ones based on concurrent skip lists: >>> >>> http://www.cs.indiana.edu/~rrnewton/haddock/lvish/ >>> >>> Alternatively, it would be ok if there were a "Data.Map.Internal" module >>> that exposed the Bin/Tip, but I assume people would rather have a clean >>> Partitionable instance... >>> >>> Best, >>> -Ryan >>> >>> >>> On Sun, Sep 29, 2013 at 3:31 AM, Mike Izbicki <m...@izbicki.me> wrote: >>> >>>> I've got a Partitionable class that I've been using for this purpose: >>>> >>>> >>>> https://github.com/mikeizbicki/ConstraintKinds/blob/master/src/Control/ConstraintKinds/Partitionable.hs >>>> >>>> The function called "parallel" in the HLearn library will automatically >>>> parallelize any homomorphism from a Partionable to a Monoid. I >>>> specifically use that to parallelize machine learning algorithms. >>>> >>>> I have two thoughts for better abstractions: >>>> >>>> 1) This Partitionable class is essentially a comonoid. By reversing >>>> the arrows of mappend, we get: >>>> >>>> comappend :: a -> (a,a) >>>> >>>> By itself, this works well if the number of processors you have is a >>>> power of two, but it needs some more fanciness to get things balanced >>>> properly for other numbers of processors. I bet there's another algebraic >>>> structure that would capture these other cases, but I'm not sure what it >>>> is. >>>> >>>> 2) I'm working with parallelizing tree structures right now (kd-trees, >>>> cover trees, oct-trees, etc.). The real problem is not splitting the >>>> number of data points equally (this is easy), but splitting the amount of >>>> work equally. Some points take longer to process than others, and this >>>> cannot be determined in advance. Therefore, an equal split of the data >>>> points can result in one processor getting 25% of the work load, and the >>>> second processor getting 75%. Some sort of lazy Partitionable class that >>>> was aware of processor loads and didn't split data points until they were >>>> needed would be ideal for this scenario. >>>> >>>> On Sat, Sep 28, 2013 at 6:46 PM, adam vogt <vogt.a...@gmail.com> wrote: >>>> >>>>> On Sat, Sep 28, 2013 at 1:09 PM, Ryan Newton <rrnew...@gmail.com> >>>>> wrote: >>>>> > Hi all, >>>>> > >>>>> > We all know and love Data.Foldable and are familiar with left folds >>>>> and >>>>> > right folds. But what you want in a parallel program is a balanced >>>>> fold >>>>> > over a tree. Fortunately, many of our datatypes (Sets, Maps) >>>>> actually ARE >>>>> > balanced trees. Hmm, but how do we expose that? >>>>> >>>>> Hi Ryan, >>>>> >>>>> At least for Data.Map, the Foldable instance seems to have a >>>>> reasonably balanced fold called fold (or foldMap): >>>>> >>>>> > fold t = go t >>>>> > where go (Bin _ _ v l r) = go l `mappend` (v `mappend` go r) >>>>> >>>>> This doesn't seem to be guaranteed though. For example ghc's derived >>>>> instance writes the foldr only, so fold would be right-associated for >>>>> a: >>>>> >>>>> > data T a = B (T a) (T a) | L a deriving (Foldable) >>>>> >>>>> Regards, >>>>> Adam >>>>> _______________________________________________ >>>>> Haskell-Cafe mailing list >>>>> Haskell-Cafe@haskell.org >>>>> http://www.haskell.org/mailman/listinfo/haskell-cafe >>>>> >>>> >>>> >>> >> >
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe