[Haskell-cafe] Re: Slower with ByteStrings?
Jason Dagit wrote: > Given a word, find all the words in the dictionary which can be made > from the letters of that word. A letter can be used at most as many > times as it appears in the input word. So, "letter" can only match > words with 0, 1, or 2 t's in them. I don't know about the ByteString thing but how about a general speedup? frequencies = map (\x -> (head x, length x)) . group . sort superset xs = \ys -> let y = frequencies ys in length y == lx && and (zipWith (\(c,i) (d,j) -> c == d && i >= j) x y) where x = frequencies xs lx = length x main = interact $ unlines . filter ("ubuntu" `superset`) . lines Regards, apfelmus ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Slower with ByteStrings?
Mirko Rahn wrote: >>> from the letters of that word. A letter can be used at most as many >>> times as it appears in the input word. So, "letter" can only match >>> words with 0, 1, or 2 t's in them. > >>frequencies = map (\x -> (head x, length x)) . group . sort >>superset xs = \ys -> let y = frequencies ys in >> length y == lx && >> and (zipWith (\(c,i) (d,j) -> c == d && i >= j) x y) >> where >> x = frequencies xs >> lx = length x > > As far as I understand the spec, this algorithm is not correct: > > superset "ubuntu" "tun" == False > > Is at least one 'b' necessary, yes or no? Oops, you are indeed right, the answer should be "no". I thought I'd came away without primitive recursion, but here's a correct version superset xs = superset' x . sort ys where x = sort xs _ `superset` [] = True [] `superset` _ = False (x:xs) `superset'` (y:ys) | x == y= xs `superset` ys | x < y= xs `superset` (y:ys) | otherwise = False > If the answer is no, the > following algorithm solves the problem and is faster then the one above: > > del y = del_acc [] > where del_acc _ [] = mzero > del_acc v (x:xs) | x == y = return (v++xs) > del_acc v (x:xs) = del_acc (x:v) xs > > super u = not . null . foldM (flip del) u > > main = interact $ unlines . filter ("ubuntu" `super`) . lines The algorithm is correct but it's not faster, xs `super` ys takes O(n*m) time whereas superset takes O(n * log n + m * log m) time given a proper sorting algorithm. Here, n = length xs and m = length ys. Actually, both algorithms are essentially the same except for the sorting that allows to drop some equality tests. (Note that memoizing x = sort xs over different ys speeds things up a bit for the intended application. This way, (sort "ubuntu") is only computed once and the running time over many ys approaches O(n + m*log m).) Regards, apfelmus PS: Some exercises for the interested reader: 1) Still, the algorithm super has an advantage over superset. Which one? 2) Put xs into a good data structure and achieve a O(m * log n) time for multiple ys. 3) Is this running time always better than the aforementioned O(n + m*log m)? What about very large m > n? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Slower with ByteStrings?
from the letters of that word. A letter can be used at most as many times as it appears in the input word. So, "letter" can only match words with 0, 1, or 2 t's in them. frequencies = map (\x -> (head x, length x)) . group . sort superset xs = \ys -> let y = frequencies ys in length y == lx && and (zipWith (\(c,i) (d,j) -> c == d && i >= j) x y) where x = frequencies xs lx = length x As far as I understand the spec, this algorithm is not correct: superset "ubuntu" "tun" == False Is at least one 'b' necessary, yes or no? If the answer is no, the following algorithm solves the problem and is faster then the one above: del y = del_acc [] where del_acc _ [] = mzero del_acc v (x:xs) | x == y = return (v++xs) del_acc v (x:xs) = del_acc (x:v) xs super u = not . null . foldM (flip del) u main = interact $ unlines . filter ("ubuntu" `super`) . lines BR, -- -- Mirko Rahn -- Tel +49-721 608 7504 -- --- http://liinwww.ira.uka.de/~rahn/ --- ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Slower with ByteStrings?
[fixed some typos, mainly missing primes] superset xs = superset' x . sort where x = sort xs _ `superset'` [] = True [] `superset'` _ = False (x:xs) `superset'` (y:ys) | x == y= xs `superset'` ys | x < y= xs `superset'` (y:ys) | otherwise = False del y = del_acc [] where del_acc _ [] = mzero del_acc v (x:xs) | x == y = return (v++xs) del_acc v (x:xs) = del_acc (x:v) xs The algorithm is correct but it's not faster, xs `super` ys takes O(n*m) time whereas superset takes O(n * log n + m * log m) time given a proper sorting algorithm. Here, n = length xs and m = length ys. Of course, you are right. In worst case super is much slower than superset. In average case (for some assumptions about the inputs) it could perform quite well because of the chance to detect non-subset words early. 2) Put xs into a good data structure and achieve a O(m * log n) time for multiple ys. You mean something along supermap xs = let mx = Map.fromListWith (+) [ (x,1) | x <- xs ] ins _ 1 = Nothing ins _ v = Just (v-1) upd m y = case Map.updateLookupWithKey ins y m of (Nothing,_ ) -> mzero (_ ,m') -> return m' in not . null . foldM upd mx Thanks for your time, BR, -- -- Mirko Rahn -- Tel +49-721 608 7504 -- --- http://liinwww.ira.uka.de/~rahn/ --- ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe