Forgot to say, that wikiextractor [1] is very convenient to work with those
dumps.
1. https://github.com/attardi/wikiextractor
2018-05-12 21:29 GMT+03:00 Hèctor Alòs i Font <[email protected]>:
> Ha, ha! I'm supposed to be the chair there :) Great! I didn't realise it
> was you.
>
> 2018-05-12 20:14 GMT+03:00 mansur <[email protected]>:
>
>> Great :)
>>
>> By the way, Hèctor, I saw your name in the symposium's program [1]. Are
>> you going there? If so, we are gonna be in the same section.
>>
>> 1. http://richfizh.chuvsu.ru/images/dokuments/Programma%20XI.pdf
>>
>> 2018-05-12 19:58 GMT+03:00 Hèctor Alòs i Font <[email protected]>:
>>
>>> Räxmät, Mansur. You are right. It seems I got a wrong dump with too many
>>> information that is irrelevant for me. The problem were not the 15 GB, but
>>> the needed space for decompressing them. I think now, it's ok.
>>>
>>> 2018-05-12 18:40 GMT+03:00 mansur <[email protected]>:
>>>
>>>> Hello!
>>>>
>>>> If you mean this corpus [1], it is not so big - 5,4Gb. Or I am wrong? I
>>>> can download it and give you some part of it, if you want.
>>>>
>>>> There are also smaller dumps [2], for example [3]
>>>>
>>>> 1. http://dumps.wikimedia.your.org/frwiki/latest/frwiki-latest-
>>>> pages-meta-current.xml.bz2
>>>> 2. http://dumps.wikimedia.your.org/frwiki/latest/
>>>> 3. http://dumps.wikimedia.your.org/frwiki/latest/frwiki-latest-
>>>> pages-meta-current1.xml-p3p412301.bz2
>>>>
>>>> With best wishes,
>>>> Mansur
>>>>
>>>> 2018-05-12 16:22 GMT+03:00 Hèctor Alòs i Font <[email protected]>:
>>>>
>>>>> 2018-05-12 14:40 GMT+03:00 Kartik Mistry <[email protected]>:
>>>>>
>>>>>> On Sat, May 12, 2018 at 2:51 PM, Hèctor Alòs i Font
>>>>>> <[email protected]> wrote:
>>>>>> > I'd like to create a French Wikipedia corpus, but I wouldn't like to
>>>>>> > download the whole Wikipedia dump. I'm not sure I have enough disk
>>>>>> space for
>>>>>> > decompressing it. Is there somewhere maybe a 10% dump?
>>>>>>
>>>>>> This can be useful too: https://dumps.wikimedia.org/ot
>>>>>> her/contenttranslation/
>>>>>
>>>>>
>>>>> Thanks, Kartik. It is too little and not enough random for what I'm
>>>>> looking for, but this is an important indeed information for improving the
>>>>> translators. A GSoC Apertium project is working on it :)
>>>>>
>>>>> ------------------------------------------------------------
>>>>> ------------------
>>>>> Check out the vibrant tech community on one of the world's most
>>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>>> _______________________________________________
>>>>> Apertium-stuff mailing list
>>>>> [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>>
>>>>>
>>>>
>>>> ------------------------------------------------------------
>>>> ------------------
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>> _______________________________________________
>>>> Apertium-stuff mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>
>>>>
>>>
>>> ------------------------------------------------------------
>>> ------------------
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> _______________________________________________
>>> Apertium-stuff mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Apertium-stuff mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff