Thanks Edward. Good point about Brent's 'split' package. That would be a really nice place to put the class. But it doesn't currently depend on containers or vector so I suppose the other instances would need to go somewhere else. (Assuming containers only exported monomorphic versions.)
Maybe a next step would be proposing some monomorphic variants for the containers package. I think the complicated bit will be describing how "best-efforty" splitting variants are: - Is it guaranteed O(1) time and allocation? - Is the provided Int an upper bound? Lower(ish) bound? Or just a hint? With some data structures, there will be a trade-off between partition imbalance and the work required to achieve balance. But with some data structures it is happily not a problem (e.g. Vector)! But whether there's one variant or a few, I'd be happy either way, as long as I get at least the cheap one (i.e. prefer imbalance to restructuring). -Ryan On Sun, Sep 29, 2013 at 8:20 AM, Edward Kmett <ekm...@gmail.com> wrote: > I don't know that it belongs in the "standard" libraries, but there could > definitely be a package for something similar. > > ConstraintKinds are a pretty hefty extension to throw at it, and the > signature written there prevents it from being used on ByteString, Text, > etc. > > This can be implemented with much lighter weight types though! > > > class Partitionable t where > > partition :: Int -> t -> [t] > > > Now ByteString, Text etc. can be instances and no real flexibility is > lost, as with the class associated constraint on the argument, you'd > already given up polymorphic recursion. > > There still remain issues. partition is already established as the filterthat > returns both the matching and unmatching elements, so the name is > wrong. > > This is a generalization of Data.List.splitEvery, perhaps it is worth > seeing how many others can be generalized similarly and talk to Brent about > adding, say, a Data.Split module to his split package in the platform? > > -Edward > > > > > > On Sun, Sep 29, 2013 at 4:21 AM, Ryan Newton <rrnew...@gmail.com> wrote: > >> <subject change> >> >> On Sun, Sep 29, 2013 at 3:31 AM, Mike Izbicki <m...@izbicki.me> wrote: >> >>> I've got a Partitionable class that I've been using for this purpose: >>> >>> https://github.com/mikeizbicki/ConstraintKinds/blob/master/src/Control/ConstraintKinds/Partitionable.hs >>> >> >> Mike -- Neat, that's a cool library! >> >> Edward -- ideally, where in the standard libraries should the >> Partitionable comonoid go? >> >> Btw, I'm not sure what the ideal return type for comappend is, given that >> it needs to be able to "bottom out". Mike, our partition function's list >> return type seems more reasonable. Or maybe something simple would be this: >> >> *class Partitionable t where* >> * partition :: t -> Maybe (t,t)* >> >> That is, at some point its not worth splitting and returns Nothing, and >> you'd better be able to deal with the 't' directly. >> >> So what I really want is for the *containers package to please get some >> kind of Partitionable instances! * Johan & others, I would be happy to >> provide a patch if the class can be agreed on. This is important because >> currently the balanced tree structure of Data.Set/Map is an *amazing and >> beneficial property* that is *not* exposed at all through the API. >> For example, it would be great to have a parallel traverse_ for Maps >> and Sets in the Par monad. The particular impetus is that our new and >> enhanced Par monad makes extensive use of Maps and Sets, both the pure, >> balanced ones, and lockfree/inplace ones based on concurrent skip lists: >> >> http://www.cs.indiana.edu/~rrnewton/haddock/lvish/ >> >> Alternatively, it would be ok if there were a "Data.Map.Internal" module >> that exposed the Bin/Tip, but I assume people would rather have a clean >> Partitionable instance... >> >> Best, >> -Ryan >> >> >> On Sun, Sep 29, 2013 at 3:31 AM, Mike Izbicki <m...@izbicki.me> wrote: >> >>> I've got a Partitionable class that I've been using for this purpose: >>> >>> >>> https://github.com/mikeizbicki/ConstraintKinds/blob/master/src/Control/ConstraintKinds/Partitionable.hs >>> >>> The function called "parallel" in the HLearn library will automatically >>> parallelize any homomorphism from a Partionable to a Monoid. I >>> specifically use that to parallelize machine learning algorithms. >>> >>> I have two thoughts for better abstractions: >>> >>> 1) This Partitionable class is essentially a comonoid. By reversing >>> the arrows of mappend, we get: >>> >>> comappend :: a -> (a,a) >>> >>> By itself, this works well if the number of processors you have is a >>> power of two, but it needs some more fanciness to get things balanced >>> properly for other numbers of processors. I bet there's another algebraic >>> structure that would capture these other cases, but I'm not sure what it is. >>> >>> 2) I'm working with parallelizing tree structures right now (kd-trees, >>> cover trees, oct-trees, etc.). The real problem is not splitting the >>> number of data points equally (this is easy), but splitting the amount of >>> work equally. Some points take longer to process than others, and this >>> cannot be determined in advance. Therefore, an equal split of the data >>> points can result in one processor getting 25% of the work load, and the >>> second processor getting 75%. Some sort of lazy Partitionable class that >>> was aware of processor loads and didn't split data points until they were >>> needed would be ideal for this scenario. >>> >>> On Sat, Sep 28, 2013 at 6:46 PM, adam vogt <vogt.a...@gmail.com> wrote: >>> >>>> On Sat, Sep 28, 2013 at 1:09 PM, Ryan Newton <rrnew...@gmail.com> >>>> wrote: >>>> > Hi all, >>>> > >>>> > We all know and love Data.Foldable and are familiar with left folds >>>> and >>>> > right folds. But what you want in a parallel program is a balanced >>>> fold >>>> > over a tree. Fortunately, many of our datatypes (Sets, Maps) >>>> actually ARE >>>> > balanced trees. Hmm, but how do we expose that? >>>> >>>> Hi Ryan, >>>> >>>> At least for Data.Map, the Foldable instance seems to have a >>>> reasonably balanced fold called fold (or foldMap): >>>> >>>> > fold t = go t >>>> > where go (Bin _ _ v l r) = go l `mappend` (v `mappend` go r) >>>> >>>> This doesn't seem to be guaranteed though. For example ghc's derived >>>> instance writes the foldr only, so fold would be right-associated for >>>> a: >>>> >>>> > data T a = B (T a) (T a) | L a deriving (Foldable) >>>> >>>> Regards, >>>> Adam >>>> _______________________________________________ >>>> Haskell-Cafe mailing list >>>> Haskell-Cafe@haskell.org >>>> http://www.haskell.org/mailman/listinfo/haskell-cafe >>>> >>> >>> >> >
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe