Hi,
The script needs an input redirect ("<") from a file instead of the stdin
and an output redirect (">") to a file instead of the stdout. The following
will do the work:
python cleanHTML.py < svwiktionary.text > svwiktionary.filter.text
Btw, I only tested it on Wikipedia, but I'm not sure whether it works for
Wiktionary too.
It'll be great if you sent a message about whether it works for Wiktionary
too, and I'll update it on the wiki page.
Thanks!
2013/9/14 Per Tunedal <per.tune...@operamail.com>
> Hi again,
> the extractor is already finished.
>
> I overlooked a line in your instructions (maybe I'm too tired):
>
> cat output/*/* > svwiktionary.text
>
> Now I'm running the cleaning script:
>
> python cleanHTML.py svwiktionary.text
>
> I will give you a report when it as finished.
>
> Yours,
> Per Tunedal
>
>
>
> On Fri, Sep 13, 2013, at 18:11, Per Tunedal wrote:
>
> Hi,
> Thank you! Your Wikipedia Extractor is running right now. I will look for
> the result in an hour.
>
> How do I use the script for filtering out "<>" tags? I've saved it as a
> Python file. Do I have to run it separately for every singe file in the
> output directory? Can't I just take every file in the directory in a row?
> Just indicate the directory?
>
> Yours,
> Per Tunedal
>
>
>
> On Fri, Sep 13, 2013, at 2:54, Gang Chen wrote:
>
> Hi,
>
> 1) Is it possible to make some kind of Wikipedia dump?
>
> This tool works fine for extracting the main text from Wikipedia,
> http://wiki.apertium.org/wiki/User:Gang_Chen/Wikipedia_Extractor
>
>
> Best wishes,
> Gang
>
>
> 2013/9/13 Per Tunedal <per.tune...@operamail.com>
>
> Hi,
> I'm planning to try to train the tagger for the pair sv-da, starting
> with Swedish.
>
> What's an appropriate corpus? Europarl is available but doesn't provide
> much of the everyday language.
>
> 1) Is it possible to make some kind of Wikipedia dump?
> 2) Lars, maybe you could suggest some free books from the Runeberg
> project that have a suitable language. I've noticed that some old books
> have old word forms or very odd spelling (i.e. August Strindberg has a
> very peculiar spelling).
>
> Yours,
> Per Tunedal
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. Consolidate legacy IT systems to a single system of record for IT
> 2. Standardize and globalize service processes across IT
> 3. Implement zero-touch automation to replace manual, redundant tasks
> http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. Consolidate legacy IT systems to a single system of record for IT
> 2. Standardize and globalize service processes across IT
> 3. Implement zero-touch automation to replace manual, redundant tasks
>
> http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
> *_______________________________________________*
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. Consolidate legacy IT systems to a single system of record for IT
> 2. Standardize and globalize service processes across IT
> 3. Implement zero-touch automation to replace manual, redundant tasks
>
> http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
> *_______________________________________________*
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. Consolidate legacy IT systems to a single system of record for IT
> 2. Standardize and globalize service processes across IT
> 3. Implement zero-touch automation to replace manual, redundant tasks
> http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff