Hi,
Thank you! Your Wikipedia Extractor is running right now. I will look
for the result in an hour.
How do I use the script for filtering out "<>" tags? I've saved it as a
Python file. Do I have to run it separately for every singe file in the
output directory? Can't I just take every file in the directory in a
row? Just indicate the directory?
Yours,
Per Tunedal
On Fri, Sep 13, 2013, at 2:54, Gang Chen wrote:
Hi,
1) Is it possible to make some kind of Wikipedia dump?
This tool works fine for extracting the main text from Wikipedia,
[1]http://wiki.apertium.org/wiki/User:Gang_Chen/Wikipedia_Extractor
Best wishes,
Gang
2013/9/13 Per Tunedal <[2]per.tune...@operamail.com>
Hi,
I'm planning to try to train the tagger for the pair sv-da, starting
with Swedish.
What's an appropriate corpus? Europarl is available but doesn't
provide
much of the everyday language.
1) Is it possible to make some kind of Wikipedia dump?
2) Lars, maybe you could suggest some free books from the Runeberg
project that have a suitable language. I've noticed that some old
books
have old word forms or very odd spelling (i.e. August Strindberg has
a
very peculiar spelling).
Yours,
Per Tunedal
--------------------------------------------------------------------
----------
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant
tasks
[3]http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/o
stg.clktrk
_______________________________________________
Apertium-stuff mailing list
[4]Apertium-stuff@lists.sourceforge.net
[5]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
-----------------------------------------------------------------------
-------
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
[6]http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg
.clktrk
_______________________________________________
Apertium-stuff mailing list
[7]Apertium-stuff@lists.sourceforge.net
[8]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
References
1. http://wiki.apertium.org/wiki/User:Gang_Chen/Wikipedia_Extractor
2. mailto:per.tune...@operamail.com
3. http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
4. mailto:Apertium-stuff@lists.sourceforge.net
5. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
6. http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
7. mailto:Apertium-stuff@lists.sourceforge.net
8. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff