On 29 Sep 2013, at 22:58, Paul Mooser <[email protected]> wrote: > Paul, is there any easy way to get the (small) dataset you're working with, > so we can run your actual code against the same data?
The dataset I'm using is a Wikipedia dump, which hardly counts as "small" :-) Having said that, the first couple of million lines is all you need to reproduce the results I'm getting, which you can download with: curl http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2 | bunzip2 | head -n 2000000 > enwiki-short.xml -- paul.butcher->msgCount++ Snetterton, Castle Combe, Cadwell Park... Who says I have a one track mind? http://www.paulbutcher.com/ LinkedIn: http://www.linkedin.com/in/paulbutcher MSN: [email protected] AIM: paulrabutcher Skype: paulrabutcher -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to [email protected] Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
