Re: [l10n-dev] [Fwd: [Translate-devel] Introducing CorpusCatcher 0.1]
On Do, 2008-07-17 at 16:48 +0200, F Wolff wrote: > Hallo everybody > > I think this announcement can be quite interesting for some people on > the list, so I'm forwarding this here. > > Translate.org.za developed CorpusCatcher to help in building web corpora > specifically for applications in spell checker building. The idea is > that this is something that can easily be extended for specific > applications. > My apologies - I meant to write to [EMAIL PROTECTED] *subscribed to too many lists*. Still, hopefully it was interesting to some of you. Sorry for the mistake. Friedel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[l10n-dev] [Fwd: [Translate-devel] Introducing CorpusCatcher 0.1]
Hallo everybody I think this announcement can be quite interesting for some people on the list, so I'm forwarding this here. Translate.org.za developed CorpusCatcher to help in building web corpora specifically for applications in spell checker building. The idea is that this is something that can easily be extended for specific applications. For any comments or to contribute improvements, please join the translate-devel mailing list here: https://lists.sourceforge.net/lists/listinfo/translate-devel Keep well Friedel Forwarded Message From: Walter Leibbrandt To: [EMAIL PROTECTED] Subject: [Translate-devel] Introducing CorpusCatcher 0.1 Date: Thu, 17 Jul 2008 16:24:49 +0200 The first version of CorpusCatcher was released recently. CorpusCatcher is a toolset for creating language corpora by crawling the Web. It was based on BootCaT (http://sslmit.unibo.it/~baroni/tools_and_resources.html), but evolved into a stand-alone project. Thanks to Kevin Scannell for his advice in this regard. Its main features are: - Querying Yahoo! for pages containing specific seed words. - Crawling the web for relevant pages. - Extracting the text from found pages. - Filtering results based on positive and/or negative word lists. The release is available for download at https://sourceforge.net/project/showfiles.php?group_id=91920&package_id=284333 The live documentation is available on the wiki at http://translate.sourceforge.net/wiki/corpuscatcher/index Dependecies to use CorpusCatcher: - Python >= 2.4 - mechanize 0.1.7b - pYsearch 3.0 See http://translate.sourceforge.net/wiki/corpuscatcher/readme#installation for installation details. Please report any bugs found at http://bugs.locamotion.org - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]