Hi,
Thank you! Your Wikipedia Extractor is running right now. I will look
for the result in an hour.

How do I use the script for filtering out "<>" tags? I've saved it as a
Python file. Do I have to run it separately for every singe file in the
output directory? Can't I just take every file in the directory  in a
row? Just indicate the directory?

Yours,
Per Tunedal



On Fri, Sep 13, 2013, at 2:54, Gang Chen wrote:

Hi,
1) Is it possible to make some kind of Wikipedia dump?

This tool works fine for extracting the main text from Wikipedia,
[1]http://wiki.apertium.org/wiki/User:Gang_Chen/Wikipedia_Extractor

Best wishes,
Gang
2013/9/13 Per Tunedal <[2]per.tune...@operamail.com>

  Hi,
  I'm planning to try to train the tagger for the pair sv-da, starting
  with Swedish.
  What's an appropriate corpus? Europarl is available but doesn't
  provide
  much of the everyday language.
  1) Is it possible to make some kind of Wikipedia dump?
  2) Lars, maybe you could suggest some free books from the Runeberg
  project that have a suitable language. I've noticed that some old
  books
  have old word forms or very odd spelling (i.e. August Strindberg has
  a
  very peculiar spelling).
  Yours,
  Per Tunedal
  --------------------------------------------------------------------
  ----------
  How ServiceNow helps IT people transform IT departments:
  1. Consolidate legacy IT systems to a single system of record for IT
  2. Standardize and globalize service processes across IT
  3. Implement zero-touch automation to replace manual, redundant
  tasks
  [3]http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/o
  stg.clktrk
  _______________________________________________
  Apertium-stuff mailing list
  [4]Apertium-stuff@lists.sourceforge.net
  [5]https://lists.sourceforge.net/lists/listinfo/apertium-stuff

-----------------------------------------------------------------------
-------

How ServiceNow helps IT people transform IT departments:

1. Consolidate legacy IT systems to a single system of record for IT

2. Standardize and globalize service processes across IT

3. Implement zero-touch automation to replace manual, redundant tasks

[6]http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg
.clktrk

_______________________________________________

Apertium-stuff mailing list

[7]Apertium-stuff@lists.sourceforge.net

[8]https://lists.sourceforge.net/lists/listinfo/apertium-stuff

References

1. http://wiki.apertium.org/wiki/User:Gang_Chen/Wikipedia_Extractor
2. mailto:per.tune...@operamail.com
3. http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
4. mailto:Apertium-stuff@lists.sourceforge.net
5. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
6. http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
7. mailto:Apertium-stuff@lists.sourceforge.net
8. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to