[Haskell-cafe] Re: Slower with ByteStrings?

2007-05-26 Thread apfelmus
Jason Dagit wrote:
> Given a word, find all the words in the dictionary which can be made
> from the letters of that word.  A letter can be used at most as many
> times as it appears in the input word.  So, "letter" can only match
> words with 0, 1, or 2 t's in them.

I don't know about the ByteString thing but how about a general speedup?

   frequencies = map (\x -> (head x, length x)) . group . sort
   superset xs = \ys -> let y = frequencies ys in
length y == lx &&
and (zipWith (\(c,i) (d,j) -> c == d && i >= j) x y)
  where
  x  = frequencies xs
  lx = length x

   main = interact $ unlines . filter ("ubuntu" `superset`) . lines

Regards,
apfelmus

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Slower with ByteStrings?

2007-05-29 Thread apfelmus
Mirko Rahn wrote:
>>> from the letters of that word.  A letter can be used at most as many
>>> times as it appears in the input word.  So, "letter" can only match
>>> words with 0, 1, or 2 t's in them.
> 
>>frequencies = map (\x -> (head x, length x)) . group . sort
>>superset xs = \ys -> let y = frequencies ys in
>> length y == lx &&
>> and (zipWith (\(c,i) (d,j) -> c == d && i >= j) x y)
>>   where
>>   x  = frequencies xs
>>   lx = length x
> 
> As far as I understand the spec, this algorithm is not correct:
> 
> superset "ubuntu" "tun" == False
> 
> Is at least one 'b' necessary, yes or no?

Oops, you are indeed right, the answer should be "no". I thought I'd
came away without primitive recursion, but here's a correct version

  superset xs = superset' x . sort ys
where
x = sort xs

_  `superset`  [] = True
[] `superset`  _  = False
(x:xs) `superset'` (y:ys)
| x == y= xs `superset` ys
| x <  y= xs `superset` (y:ys)
| otherwise = False

> If the answer is no, the
> following algorithm solves the problem and is faster then the one above:
> 
> del y = del_acc []
> where del_acc _ []  = mzero
>   del_acc v (x:xs) | x == y = return (v++xs)
>   del_acc v (x:xs)  = del_acc (x:v) xs
> 
> super u = not . null . foldM (flip del) u
> 
> main = interact $ unlines . filter ("ubuntu" `super`) . lines

The algorithm is correct but it's not faster, xs `super` ys  takes
O(n*m) time whereas superset takes O(n * log n + m * log m) time given a
proper sorting algorithm. Here, n = length xs and m = length ys.

Actually, both algorithms are essentially the same except for the
sorting that allows to drop some equality tests.

(Note that memoizing x = sort xs over different ys speeds things up a
bit for the intended application. This way, (sort "ubuntu") is only
computed once and the running time over many ys approaches O(n + m*log m).)

Regards,
apfelmus

PS: Some exercises for the interested reader:
1) Still, the algorithm super has an advantage over superset. Which one?
2) Put xs into a good data structure and achieve a O(m * log n) time for
multiple ys.
3) Is this running time always better than the aforementioned O(n +
m*log m)? What about very large m > n?

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Slower with ByteStrings?

2007-05-29 Thread Mirko Rahn



from the letters of that word.  A letter can be used at most as many
times as it appears in the input word.  So, "letter" can only match
words with 0, 1, or 2 t's in them.



   frequencies = map (\x -> (head x, length x)) . group . sort
   superset xs = \ys -> let y = frequencies ys in
length y == lx &&
and (zipWith (\(c,i) (d,j) -> c == d && i >= j) x y)
  where
  x  = frequencies xs
  lx = length x


As far as I understand the spec, this algorithm is not correct:

superset "ubuntu" "tun" == False

Is at least one 'b' necessary, yes or no? If the answer is no, the 
following algorithm solves the problem and is faster then the one above:


del y = del_acc []
where del_acc _ []  = mzero
  del_acc v (x:xs) | x == y = return (v++xs)
  del_acc v (x:xs)  = del_acc (x:v) xs

super u = not . null . foldM (flip del) u

main = interact $ unlines . filter ("ubuntu" `super`) . lines

BR,

--
-- Mirko Rahn -- Tel +49-721 608 7504 --
--- http://liinwww.ira.uka.de/~rahn/ ---
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Slower with ByteStrings?

2007-05-29 Thread Mirko Rahn


[fixed some typos, mainly missing primes]


  superset xs = superset' x . sort
where
x = sort xs

_  `superset'`  [] = True
[] `superset'`  _  = False
(x:xs) `superset'` (y:ys)
| x == y= xs `superset'` ys
| x <  y= xs `superset'` (y:ys)
| otherwise = False



del y = del_acc []
   where del_acc _ []  = mzero
 del_acc v (x:xs) | x == y = return (v++xs)
 del_acc v (x:xs)  = del_acc (x:v) xs



The algorithm is correct but it's not faster, xs `super` ys  takes
O(n*m) time whereas superset takes O(n * log n + m * log m) time given a
proper sorting algorithm. Here, n = length xs and m = length ys.


Of course, you are right. In worst case super is much slower than 
superset. In average case (for some assumptions about the inputs) it 
could perform quite well because of the chance to detect non-subset 
words early.



2) Put xs into a good data structure and achieve a O(m * log n) time for
multiple ys.


You mean something along

supermap xs =
let mx  = Map.fromListWith (+) [ (x,1) | x <- xs ]
ins _ 1 = Nothing
ins _ v = Just (v-1)
upd m y = case Map.updateLookupWithKey ins y m of
   (Nothing,_ ) -> mzero
   (_  ,m') -> return m'
in not . null . foldM upd mx

Thanks for your time,

BR,

--
-- Mirko Rahn -- Tel +49-721 608 7504 --
--- http://liinwww.ira.uka.de/~rahn/ ---
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe