If you're looking for tools to build corpuses from miscellaneous websites,
I'd recommend taking a look at pattern.web[1]. It's very easy to construct
very specific spiders, and also convert stuff to plaintext. I've used it
precisely for this task several times, so feel free to ask if there's
anything concerning it.
[1]: http://www.clips.ua.ac.be/pages/pattern-web
On Sun, Dec 9, 2012 at 2:25 PM, Francis Tyers <fty...@prompsit.com> wrote:
> El dg 09 de 12 de 2012 a les 21:04 +0100, en/na Per Tunedal va escriure:
> > Hi,
> >
> > What's scraper.py ?
> >
> > Some tool to build a corpus from the internet? Or from other sources? Or
> > what? Might be useful for me.
> >
> > How to use it?
>
> Jonathan knows, he's on IRC.
>
> F.
>
>
>
> ------------------------------------------------------------------------------
> LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
> Remotely access PCs and mobile devices and provide instant support
> Improve your efficiency, and focus on delivering more value-add services
> Discover what IT Professionals Know. Rescue delivers
> http://p.sf.net/sfu/logmein_12329d2d
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff