Hi Nikolas, Interesting problem. I'd do something like the following, where the initial spanning tree from you example (re-tree-ified) is:
{- ghci> :t t t :: Tree (Id, Cost) g ghci> ppT t (4,0) | +- (3,1) | | | `- (1,1) | `- (2,3) | `- (5,12) -} and which results in the tree: {- ghci> let s = agglom fst snd t ghci> :t s s :: Tree (Cost, [Id]) ghci> ppT s (0,[4]) | +- (1,[3,1]) | `- (3,[2]) | `- (12,[5]) -} which can then be flattened/etc as needed by further steps of the algo. The code for `agglom': ----------------------------------------------------------------------------- import Data.Tree import Data.List type Id = Int type Cost = Int t :: Tree (Id,Cost) t = Node (4,0) [Node (3,1) [Node (1,1) []] ,Node (2,3) [Node (5,12) []]] ppT :: Show a => Tree a -> IO () ppT = putStrLn . drawTree . fmap show -- | Compress the incoming @Tree a@ with @accu...@. agglom :: Eq cost => (a -> b) -> (a -> cost) -> Tree a -> Tree (cost,[b]) agglom proj cost = go where accum = accumEq proj cost go (Node a []) = Node (cost a,[proj a]) [] go (Node a ts) = let b = proj a c = cost a (bs,ss) = accum c ts in Node (c,b:bs) (fmap go ss) -- | Repeatedly @splitEq@, and return a pair -- whose /first/ element is a list of the projected -- @b...@s from those root values along paths from -- the roots of the trees in the incoming @[Tree a]@ -- which have @cost@ equal to the third function parameter, -- and whose /second/ element is the (concatenation of the) -- list(s) gotten from each of the @splitEq@ calls. accumEq :: Eq cost => (a -> b) -> (a -> cost) -> cost -> [Tree a] -> ([b],[Tree a]) accumEq proj cost c = go [] [] where split ts = splitEq proj cost c ts go xs ys [] = (xs,ys) go xs ys ts = let (eqs,neqs) = split ts in case eqs of []-> ([],ts) _ -> let (bs,tss) = unzip eqs in go (bs++xs) (neqs++ys) (concat tss) -- | Split the incoming trees into -- (1) a @[(b,Tree a)]@ of each @b@ is the -- @p...@ected value from an @a@ where -- the @cost@ of that @a@ is equal to -- the third function parameter, and (2) -- the members of the incoming @[Tree a]@ -- whose roots' costs are /not/ equal to -- the third function parameter. splitEq :: Eq cost => (a -> b) -> (a -> cost) -> cost -> [Tree a] -> ([(b,[Tree a])],[Tree a]) splitEq proj cost c = foldl' go ([],[]) where go (!eqs,!neqs) t@(Node a ts) | c==cost a = ((proj a,ts):eqs,neqs) | otherwise = (eqs,t:neqs) ----------------------------------------------------------------------------- Cheers, Matt On 12/23/09, Nikolas Borrel-Jensen <nikolasbor...@gmail.com> wrote: > Hi! I have some trouble implementing single-linkage clustering algorithm by > using a minimum-spanning tree, so I would appreciate if some of you could > give me some advise. > > I am implementing a single-linkage clustering algorithm, and my approach is > to use minimum spanning trees for that task. I am using the library FGL ( > http://web.engr.oregonstate.edu/~erwig/fgl/haskell/), and I have managed to > compute a minimum spanning tree from an arbitrary fully connected graph with > 5 nodes. I get [ [(4,0) ] , [ (3,1) , (4,0) ] , [ (1,1) , (3,1) , (4,0) ] , > [ (2,3) , (4,0) ] , [ (5,12) , (2,3) , (4,0) ] ], which is the root path > tree of the minimum spanning tree created by the function msTreeAt. > > >From that I would create a dendrogram. [ (1,1) , (3,1) , (4,0) ] is > telling > that node 1,3 and 4 has the same cost, namely cost 1. Therefore these are > merged at level 1. At level 1 we now have 3 clusters: (1,3,4), 2 and 5. Now > the second lowest should be merged, that is 2 and 4. BUT because 4 is > already merged in the cluster (1,3,4), we should merge (1,3,4) and 2 at > level 3 (because the cost is 3). Now at level 3 we have 2 clusters, > (1,2,3,4) and 5. Now we merge the last one at level 12: (1,2,3,4,5), and we > are finished. > > I have very hard to see, how this could be done efficiently without pointers > (as in C). I have thought of just saving the nodes from the start of the > root path, and traversing it, but a lot of searching should be done all the > time. > > Can you please give me some advise on that? > > Kind regards > > Nikolas Borrel-Jensen > Computer Science > University Of Copenhagen > _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe