[Xmldatadumps-l] (no subject)

2021-01-04 Thread Tereza Belzová

___
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l


[Xmldatadumps-l] (no subject)

2021-01-04 Thread Tereza Belzová

___
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l


[Xmldatadumps-l] (no subject)

2021-01-04 Thread Tereza Belzová

___
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l


Re: [Xmldatadumps-l] (no subject)

2020-08-21 Thread Yuki Kumagai
Thank you v much John!

Yes that was the case! The number does match now.

Thanks!
Yuki


On 20 Aug 2020, at 16:42, John  wrote:

Are you limiting your count to namespace 0?

On Thu, Aug 20, 2020 at 10:45 AM Yuki Kumagai 
wrote:

> Hiya
>
> I have a question about wikipedia xml database dump. Apologies if this
> wasn't an appropriate place for asking a question.
> On a wikipedia page, it's mentioned that the current number of articles in
> english is: 6,144,248
> https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia
>
> However when I count the number of page elements in recent dump (excluding
> redirects) it's about ~10 million
> I was just wondering what would be the reason for this?
>
> Thank you in advance
>
> --
> *Yuki Kumagai*
> Senior Engineer
> CognitionX 
>
>
>
>
>
> Driving the acceleration and responsible deployment of AI
> Stay up-to-date with our daily All Things AI
>  newsletter
>
>
>
>
> ___
>
> Xmldatadumps-l mailing list
>
> Xmldatadumps-l@lists.wikimedia.org
>
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
>
___
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l


Re: [Xmldatadumps-l] (no subject)

2020-08-20 Thread John
Are you limiting your count to namespace 0?

On Thu, Aug 20, 2020 at 10:45 AM Yuki Kumagai 
wrote:

> Hiya
>
> I have a question about wikipedia xml database dump. Apologies if this
> wasn't an appropriate place for asking a question.
> On a wikipedia page, it's mentioned that the current number of articles in
> english is: 6,144,248
> https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia
>
> However when I count the number of page elements in recent dump (excluding
> redirects) it's about ~10 million
> I was just wondering what would be the reason for this?
>
> Thank you in advance
>
> --
> *Yuki Kumagai*
> Senior Engineer
> CognitionX 
>
>
>
>
>
>
> Driving the acceleration and responsible deployment of AI
> Stay up-to-date with our daily All Things AI
>  newsletter
>
>
>
>
> ___
>
> Xmldatadumps-l mailing list
>
> Xmldatadumps-l@lists.wikimedia.org
>
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
>
___
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l


[Xmldatadumps-l] (no subject)

2020-08-20 Thread Yuki Kumagai
Hiya

I have a question about wikipedia xml database dump. Apologies if this
wasn't an appropriate place for asking a question.
On a wikipedia page, it's mentioned that the current number of articles in
english is: 6,144,248
https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia

However when I count the number of page elements in recent dump (excluding
redirects) it's about ~10 million
I was just wondering what would be the reason for this?

Thank you in advance

-- 
*Yuki Kumagai*
Senior Engineer
CognitionX 


Driving the acceleration and responsible deployment of AI
Stay up-to-date with our daily All Things AI
 newsletter
___
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l


[Xmldatadumps-l] (no subject)

2019-07-16 Thread محمد باباخان
@@@$
___
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l


[Xmldatadumps-l] (no subject)

2015-07-22 Thread Samina Akter Jhara
n

___
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l


[Xmldatadumps-l] (no subject)

2015-02-19 Thread Roger Dresen
Hu

badassbarn
___
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l


[Xmldatadumps-l] (no subject)

2014-03-08 Thread عواد الحويطي

___
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l


Re: [Xmldatadumps-l] (no subject)

2014-01-21 Thread Randall Farmer
 For external uses like XML dumps integrating the compression
 strategy into LZMA would however be very attractive. This would also
 benefit other users of LZMA compression like HBase.

For dumps or other uses, 7za -mx=3 / xz -3 is your best bet.

That has a 4 MB buffer, compression ratios within 15-25% of
current 7zip (or histzip), and goes at 30MB/s on my box,
which is still 8x faster than the status quo (going by a 1GB
benchmark).

Trying to get quick-and-dirty long-range matching into LZMA isn't
feasible for me personally and there may be inherent technical
difficulties. Still, I left a note on the 7-Zip boards as folks
suggested; feel free to add anything there:
https://sourceforge.net/p/sevenzip/discussion/45797/thread/73ed3ad7/

Thanks for the reply,
Randall



On Tue, Jan 21, 2014 at 2:19 PM, Randall Farmer rand...@wawd.com wrote:

  For external uses like XML dumps integrating the compression
  strategy into LZMA would however be very attractive. This would also
  benefit other users of LZMA compression like HBase.

 For dumps or other uses, 7za -mx=3 / xz -3 is your best bet.

 That has a 4 MB buffer, compression ratios within 15-25% of
 current 7zip (or histzip), and goes at 30MB/s on my box,
 which is still 8x faster than the status quo (going by a 1GB
 benchmark).

 Re: trying to get long-range matching into LZMA, first, I
 couldn't confidently hack on liblzma. Second, Igor might
 not want to do anything as niche-specific as this (but who
 knows!). Third, even with a faster matching strategy, the
 LZMA *format* seems to require some intricate stuff (range
 coding) that be a blocker to getting the ideal speeds
 (honestly not sure).

 In any case, I left a note on the 7-Zip boards as folks have
 suggested: 
 https://sourceforge.net/p/sevenzip/discussion/45797/thread/73ed3ad7/

 Thanks for the reply,
 Randall


___
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l


[Xmldatadumps-l] (no subject)

2013-05-03 Thread yossi galanty
-- 
יוסי גלנטי
0502441015
galan...@gmail.com
___
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l