Yes, a good tool indeed, although maybe a bit slow. I've but a link at
http://wiki.apertium.org/wiki/Wikipedia_dumps

2018-05-12 22:40 GMT+03:00 mansur <[email protected]>:

> Forgot to say, that wikiextractor [1] is very convenient to work with
> those dumps.
>
> 1. https://github.com/attardi/wikiextractor
>
> 2018-05-12 21:29 GMT+03:00 Hèctor Alòs i Font <[email protected]>:
>
>> Ha, ha! I'm supposed to be the chair there :) Great! I didn't realise it
>> was you.
>>
>> 2018-05-12 20:14 GMT+03:00 mansur <[email protected]>:
>>
>>> Great :)
>>>
>>> By the way, Hèctor, I saw your name in the symposium's program [1]. Are
>>> you going there? If so, we are gonna be in the same section.
>>>
>>> 1. http://richfizh.chuvsu.ru/images/dokuments/Programma%20XI.pdf
>>>
>>> 2018-05-12 19:58 GMT+03:00 Hèctor Alòs i Font <[email protected]>:
>>>
>>>> Räxmät, Mansur. You are right. It seems I got a wrong dump with too
>>>> many information that is irrelevant for me. The problem were not the 15 GB,
>>>> but the needed space for decompressing them. I think now, it's ok.
>>>>
>>>> 2018-05-12 18:40 GMT+03:00 mansur <[email protected]>:
>>>>
>>>>> Hello!
>>>>>
>>>>> If you mean this corpus [1], it is not so big - 5,4Gb. Or I am wrong?
>>>>> I can download it and give you some part of it, if you want.
>>>>>
>>>>> There are also smaller dumps [2], for example [3]
>>>>>
>>>>> 1. http://dumps.wikimedia.your.org/frwiki/latest/frwiki-latest-
>>>>> pages-meta-current.xml.bz2
>>>>> 2. http://dumps.wikimedia.your.org/frwiki/latest/
>>>>> 3. http://dumps.wikimedia.your.org/frwiki/latest/frwiki-latest-
>>>>> pages-meta-current1.xml-p3p412301.bz2
>>>>>
>>>>> With best wishes,
>>>>> Mansur
>>>>>
>>>>> 2018-05-12 16:22 GMT+03:00 Hèctor Alòs i Font <[email protected]>:
>>>>>
>>>>>> 2018-05-12 14:40 GMT+03:00 Kartik Mistry <[email protected]>:
>>>>>>
>>>>>>> On Sat, May 12, 2018 at 2:51 PM, Hèctor Alòs i Font
>>>>>>> <[email protected]> wrote:
>>>>>>> > I'd like to create a French Wikipedia corpus, but I wouldn't like
>>>>>>> to
>>>>>>> > download the whole Wikipedia dump. I'm not sure I have enough disk
>>>>>>> space for
>>>>>>> > decompressing it. Is there somewhere maybe a 10% dump?
>>>>>>>
>>>>>>> This can be useful too: https://dumps.wikimedia.org/ot
>>>>>>> her/contenttranslation/
>>>>>>
>>>>>>
>>>>>> Thanks, Kartik. It is too little and not enough random for what I'm
>>>>>> looking for, but this is an important indeed information for improving 
>>>>>> the
>>>>>> translators. A GSoC Apertium project is working on it :)
>>>>>>
>>>>>> ------------------------------------------------------------
>>>>>> ------------------
>>>>>> Check out the vibrant tech community on one of the world's most
>>>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>>>> _______________________________________________
>>>>>> Apertium-stuff mailing list
>>>>>> [email protected]
>>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>>>
>>>>>>
>>>>>
>>>>> ------------------------------------------------------------
>>>>> ------------------
>>>>> Check out the vibrant tech community on one of the world's most
>>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>>> _______________________________________________
>>>>> Apertium-stuff mailing list
>>>>> [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>>
>>>>>
>>>>
>>>> ------------------------------------------------------------
>>>> ------------------
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>> _______________________________________________
>>>> Apertium-stuff mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>
>>>>
>>>
>>> ------------------------------------------------------------
>>> ------------------
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> _______________________________________________
>>> Apertium-stuff mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Apertium-stuff mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to