Hi,

The script needs an input redirect ("<") from a file instead of the stdin
and an output redirect (">") to a file instead of the stdout. The following
will do the work:

python cleanHTML.py < svwiktionary.text >  svwiktionary.filter.text

Btw, I only tested it on Wikipedia, but I'm not sure whether it works for
Wiktionary too.
It'll be great if you sent a message about whether it works for Wiktionary
too, and I'll update it on the wiki page.
Thanks!


2013/9/14 Per Tunedal <per.tune...@operamail.com>

>   Hi again,
>  the extractor is already finished.
>
>  I overlooked a line in your instructions (maybe I'm too tired):
>
>  cat output/*/* > svwiktionary.text
>
>  Now I'm running the cleaning script:
>
>  python cleanHTML.py svwiktionary.text
>
>  I will give you a report when it as finished.
>
>  Yours,
>  Per Tunedal
>
>
>
>  On Fri, Sep 13, 2013, at 18:11, Per Tunedal wrote:
>
>   Hi,
>  Thank you! Your Wikipedia Extractor is running right now. I will look for
> the result in an hour.
>
>  How do I use the script for filtering out "<>" tags? I've saved it as a
> Python file. Do I have to run it separately for every singe file in the
> output directory? Can't I just take every file in the directory  in a row?
> Just indicate the directory?
>
>  Yours,
>  Per Tunedal
>
>
>
>  On Fri, Sep 13, 2013, at 2:54, Gang Chen wrote:
>
>  Hi,
>
> 1) Is it possible to make some kind of Wikipedia dump?
>
> This tool works fine for extracting the main text from Wikipedia,
> http://wiki.apertium.org/wiki/User:Gang_Chen/Wikipedia_Extractor
>
>
>  Best wishes,
> Gang
>
>
>  2013/9/13 Per Tunedal <per.tune...@operamail.com>
>
> Hi,
> I'm planning to try to train the tagger for the pair sv-da, starting
> with Swedish.
>
> What's an appropriate corpus? Europarl is available but doesn't provide
> much of the everyday language.
>
> 1) Is it possible to make some kind of Wikipedia dump?
> 2) Lars, maybe you could suggest some free books from the Runeberg
> project that have a suitable language. I've noticed that some old books
> have old word forms or very odd spelling (i.e. August Strindberg has a
> very peculiar spelling).
>
> Yours,
> Per Tunedal
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. Consolidate legacy IT systems to a single system of record for IT
> 2. Standardize and globalize service processes across IT
> 3. Implement zero-touch automation to replace manual, redundant tasks
> http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
> ------------------------------------------------------------------------------
>  How ServiceNow helps IT people transform IT departments:
>  1. Consolidate legacy IT systems to a single system of record for IT
>  2. Standardize and globalize service processes across IT
>  3. Implement zero-touch automation to replace manual, redundant tasks
>
> http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
>  *_______________________________________________*
>  Apertium-stuff mailing list
>  Apertium-stuff@lists.sourceforge.net
>  https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
>
> ------------------------------------------------------------------------------
>  How ServiceNow helps IT people transform IT departments:
>  1. Consolidate legacy IT systems to a single system of record for IT
>  2. Standardize and globalize service processes across IT
>  3. Implement zero-touch automation to replace manual, redundant tasks
>
> http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
>  *_______________________________________________*
>  Apertium-stuff mailing list
>  Apertium-stuff@lists.sourceforge.net
>  https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. Consolidate legacy IT systems to a single system of record for IT
> 2. Standardize and globalize service processes across IT
> 3. Implement zero-touch automation to replace manual, redundant tasks
> http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to