Just out of curiosity, did you try "wc -l"?
Robby
On Jun 29, 2006, at 1:18 PM, Chad Scherrer wrote:
I have a bunch of data files where each line represents a data
point. It's nice to be able to quickly tell how many data points I
have. I had been using wc, like this:
% cat *.txt | /usr/bin/time wc
2350570 4701140 49149973
5.81user 0.03system 0:06.08elapsed 95%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (152major+18minor)pagefaults 0swaps
I only really care about the line count and the time it takes. For
larger data sets, I was getting tired of waiting for wc, and I
wondered whether ByteString.Lazy could help me do better. So I
wrote a 2-liner:
import qualified Data.ByteString.Lazy.Char8 as L
main = L.getContents >>= print . L.count '\n'
... and compiled this as "lc". It doesn't get much simpler than
that. How does it perform?
% cat *.txt | /usr/bin/time lc
2350570
0.09user 0.13system 0:00.24elapsed 89%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (199major+211minor)pagefaults 0swaps
Wow. 64 times as fast for this run, with almost no effort on my
part. Granted, wc is doing more work, but the number of words and
characters aren't interesting to me in this case, anyway. I can't
imagine (implementation time)*(execution time) being much shorter.
Thanks, Don!
--
Chad Scherrer
"Time flies like an arrow; fruit flies like a banana" -- Groucho Marx
_______________________________________________
Haskell mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell
_______________________________________________
Haskell mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell