For completeness, you might then do the actual clustering something like: ------------------------------------------------------------------------ import Data.Tree import Data.List import Data.Function
-- ... code from before ... cluster :: Ord cost => (a -> b) -> (a -> cost) -> Tree a -> Cluster (cost,[b]) cluster proj cost t = -- List can't be empty since Tree can't. let o:os = sortBy (compare `on` fst) . flatten . agglom proj cost $ t in foldl' cons (One o) os data Cluster a = One a | Many [Cluster a] deriving(Eq,Ord,Read,Show) instance Functor Cluster where fmap f (One a) = One (f a) fmap f (Many cs) = Many ((fmap . fmap) f cs) cons :: Cluster a -> a -> Cluster a cons c a = Many [c,One a] {- ghci> let c = cluster fst snd t ghci> :t c c :: Cluster (Cost, [Id]) ghci> c Many [Many [Many [One (0,[4]),One (1,[3,1])],One (3,[2])],One (12,[5])] ghci> :t fmap snd c fmap snd c :: Cluster [Id] ghci> fmap snd c Many [Many [Many [One [4],One [3,1]],One [2]],One [5]] ghci> :t fmap fst c fmap fst c :: Cluster Cost ghci> fmap fst c Many [Many [Many [One 0,One 1],One 3],One 12] -} ------------------------------------------------------------------------------- Matt On 12/23/09, Matt Morrow <moonpa...@gmail.com> wrote: > Hi Nikolas, > > Interesting problem. I'd do something like the following, where > the initial spanning tree from you example (re-tree-ified) is: > > {- > ghci> :t t > t :: Tree (Id, Cost) > g > ghci> ppT t > (4,0) > | > +- (3,1) > | | > | `- (1,1) > | > `- (2,3) > | > `- (5,12) > -} > > and which results in the tree: > > {- > ghci> let s = agglom fst snd t > ghci> :t s > s :: Tree (Cost, [Id]) > ghci> ppT s > (0,[4]) > | > +- (1,[3,1]) > | > `- (3,[2]) > | > `- (12,[5]) > -} > > which can then be flattened/etc as needed by further steps of the algo. > > The code for `agglom': > > ----------------------------------------------------------------------------- > import Data.Tree > import Data.List > > type Id = Int > type Cost = Int > > t :: Tree (Id,Cost) > t = Node (4,0) > [Node (3,1) [Node (1,1) []] > ,Node (2,3) [Node (5,12) []]] > > ppT :: Show a => Tree a -> IO () > ppT = putStrLn . drawTree . fmap show > > -- | Compress the incoming @Tree a@ with @accu...@. > agglom :: Eq cost > => (a -> b) > -> (a -> cost) > -> Tree a -> Tree (cost,[b]) > agglom proj cost = go > where accum = accumEq proj cost > go (Node a []) = Node (cost a,[proj a]) [] > go (Node a ts) = let b = proj a > c = cost a > (bs,ss) = accum c ts > in Node (c,b:bs) (fmap go ss) > > -- | Repeatedly @splitEq@, and return a pair > -- whose /first/ element is a list of the projected > -- @b...@s from those root values along paths from > -- the roots of the trees in the incoming @[Tree a]@ > -- which have @cost@ equal to the third function parameter, > -- and whose /second/ element is the (concatenation of the) > -- list(s) gotten from each of the @splitEq@ calls. > accumEq :: Eq cost > => (a -> b) > -> (a -> cost) -> cost > -> [Tree a] -> ([b],[Tree a]) > accumEq proj cost c = go [] [] > where split ts = splitEq proj cost c ts > go xs ys [] = (xs,ys) > go xs ys ts = let (eqs,neqs) = split ts > in case eqs of > []-> ([],ts) > _ -> let (bs,tss) = unzip eqs > in go (bs++xs) > (neqs++ys) > (concat tss) > > -- | Split the incoming trees into > -- (1) a @[(b,Tree a)]@ of each @b@ is the > -- @p...@ected value from an @a@ where > -- the @cost@ of that @a@ is equal to > -- the third function parameter, and (2) > -- the members of the incoming @[Tree a]@ > -- whose roots' costs are /not/ equal to > -- the third function parameter. > splitEq :: Eq cost > => (a -> b) > -> (a -> cost) -> cost > -> [Tree a] -> ([(b,[Tree a])],[Tree a]) > splitEq proj cost c = foldl' go ([],[]) > where go (!eqs,!neqs) > t@(Node a ts) > | c==cost a = ((proj a,ts):eqs,neqs) > | otherwise = (eqs,t:neqs) > ----------------------------------------------------------------------------- > > Cheers, > Matt > > On 12/23/09, Nikolas Borrel-Jensen <nikolasbor...@gmail.com> wrote: >> Hi! I have some trouble implementing single-linkage clustering algorithm >> by >> using a minimum-spanning tree, so I would appreciate if some of you could >> give me some advise. >> >> I am implementing a single-linkage clustering algorithm, and my approach >> is >> to use minimum spanning trees for that task. I am using the library FGL ( >> http://web.engr.oregonstate.edu/~erwig/fgl/haskell/), and I have managed >> to >> compute a minimum spanning tree from an arbitrary fully connected graph >> with >> 5 nodes. I get [ [(4,0) ] , [ (3,1) , (4,0) ] , [ (1,1) , (3,1) , (4,0) ] >> , >> [ (2,3) , (4,0) ] , [ (5,12) , (2,3) , (4,0) ] ], which is the root path >> tree of the minimum spanning tree created by the function msTreeAt. >> >> >From that I would create a dendrogram. [ (1,1) , (3,1) , (4,0) ] is >> telling >> that node 1,3 and 4 has the same cost, namely cost 1. Therefore these are >> merged at level 1. At level 1 we now have 3 clusters: (1,3,4), 2 and 5. >> Now >> the second lowest should be merged, that is 2 and 4. BUT because 4 is >> already merged in the cluster (1,3,4), we should merge (1,3,4) and 2 at >> level 3 (because the cost is 3). Now at level 3 we have 2 clusters, >> (1,2,3,4) and 5. Now we merge the last one at level 12: (1,2,3,4,5), and >> we >> are finished. >> >> I have very hard to see, how this could be done efficiently without >> pointers >> (as in C). I have thought of just saving the nodes from the start of the >> root path, and traversing it, but a lot of searching should be done all >> the >> time. >> >> Can you please give me some advise on that? >> >> Kind regards >> >> Nikolas Borrel-Jensen >> Computer Science >> University Of Copenhagen >> > _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe