Brion Vibber wrote:
>>> Decompression takes as long as compression with bzip2
>> I think decompression is *faster* than compression
>> http://tukaani.org/lzma/benchmarks
>
> LZMA is nice and fast to decompress... but *insanely* slower to
> compress, and doesn't seem as parallelizable. :(
>
> --
On Thu, Mar 26, 2009 at 8:51 PM, ERSEK Laszlo wrote:
> On 03/27/09 01:14, Brion Vibber wrote:
>
> > LZMA is nice and fast to decompress... but *insanely* slower to
> > compress, and doesn't seem as parallelizable. :(
>
> The xz file format should allow for "easy" parallelization, both when
> comp
On 03/27/09 01:14, Brion Vibber wrote:
> LZMA is nice and fast to decompress... but *insanely* slower to
> compress, and doesn't seem as parallelizable. :(
The xz file format should allow for "easy" parallelization, both when
compressing and decompressing; see
http://tukaani.org/xz/xz-file-for
On 3/26/09 3:25 PM, Keisial wrote:
> Quite interesting. Can the images at office.wikimedia.org be moved to
> somewhere public?
I've copied those two to the public wiki. :)
>> Decompression takes as long as compression with bzip2
> I think decompression is *faster* than compression
> http://tukaan
Tomasz Finc wrote:
> I've started drafting some new ideas at
> http://wikitech.wikimedia.org/view/Data_dump_redesign
>
> of the various problems that were facing and what kind of job management
> we can put around it. Were taking this on as a full "should have been
> done 2 years ago" project a
On 3/25/09 10:08 AM, Christian Storm wrote:
> Thanks to everyone who got the enwiki dumps going again! Should we expect
> more regular dumps now? What was the final solution of fixing this?
>
>
Lots of love and upkeep by everyone :)
But really its needs to be more automated and made parallelise
toolserver users dont have access to text
On Wed, Mar 25, 2009 at 7:05 PM, Brian wrote:
> Perhaps the toolserver can make you a current dump of current en?
>
> On Wed, Mar 25, 2009 at 11:08 AM, Christian Storm >wrote:
>
> > Thanks to everyone who got the enwiki dumps going again! Should we
> e
Perhaps the toolserver can make you a current dump of current en?
On Wed, Mar 25, 2009 at 11:08 AM, Christian Storm wrote:
> Thanks to everyone who got the enwiki dumps going again! Should we expect
> more regular dumps now? What was the final solution of fixing this?
>
>
>
> >
> > We are havin
Thanks to everyone who got the enwiki dumps going again! Should we expect
more regular dumps now? What was the final solution of fixing this?
>
> We are having to resort to crawling en.wikipedia.org while we await
> for regular dumps.
> What is the minimum crawling delay we can get away with?
Brion,
We are having to resort to crawling en.wikipedia.org while we await
for regular dumps.
What is the minimum crawling delay we can get away with? I figure if we
have 1 second delay then we'd be able to crawl the 2+ million articles
in a month.
I know crawling is discouraged but it seems
On Thu, Jan 29, 2009 at 11:20 AM, Brion Vibber wrote:
> On 1/28/09 8:32 AM, Brion Vibber wrote:
>> Probably wise to poke in a hack to skip the history first. :)
>
> Done in r46545.
>
> Updated dump scripts and canceled the old enwiki dump.
>
> New dumps also will be attempting to generate log outp
On 1/28/09 8:32 AM, Brion Vibber wrote:
> Probably wise to poke in a hack to skip the history first. :)
Done in r46545.
Updated dump scripts and canceled the old enwiki dump.
New dumps also will be attempting to generate log output as XML which
correctly handles the deletion/oversighting option
On Thu, Jan 29, 2009 at 1:52 AM, Gerard Meijssen
wrote:
> Hoi,
> Two things:
>
> - if we abort the backup now, we do not know if we WILL have something at
> the time it would have ended
> - if the toolserver data can provide a service as a stop gap measure why
> not provide that in the mea
Hoi,
Two things:
- if we abort the backup now, we do not know if we WILL have something at
the time it would have ended
- if the toolserver data can provide a service as a stop gap measure why
not provide that in the mean time
Thanks,
GerardM
2009/1/29 Alai
> Russell Blau ho
Russell Blau hotmail.com> writes:
> FWIW, I'll add my vote for aborting the current dump *now* if we don't
> expect it ever to actually be finished, so we can at least get a fresh dump
> of the current pages.
I'd like to third/fourth/(other ordinal) this idea too. I've been using the
(in compa
That would be great. I second this notion whole heartedly.
On Jan 28, 2009, at 7:34 AM, Russell Blau wrote:
> "Brion Vibber" wrote in message
> news:497f9c35.9050...@wikimedia.org...
>> On 1/27/09 2:55 PM, Robert Rohde wrote:
>>> On Tue, Jan 27, 2009 at 2:42 PM, Brion Vibber
>>> wrote:
On
Probably wise to poke in a hack to skip the history first. :)
-- brion vibber (brion @ wikimedia.org)
On Jan 28, 2009, at 7:34, "Russell Blau" wrote:
> "Brion Vibber" wrote in message
> news:497f9c35.9050...@wikimedia.org...
>> On 1/27/09 2:55 PM, Robert Rohde wrote:
>>> On Tue, Jan 27, 2009 a
"Brion Vibber" wrote in message
news:497f9c35.9050...@wikimedia.org...
> On 1/27/09 2:55 PM, Robert Rohde wrote:
>> On Tue, Jan 27, 2009 at 2:42 PM, Brion Vibber
>> wrote:
>>> On 1/27/09 2:35 PM, Thomas Dalton wrote:
The way I see it, what we need is to get a really powerful server
>>> Nope
On 1/27/09 2:55 PM, Robert Rohde wrote:
> On Tue, Jan 27, 2009 at 2:42 PM, Brion Vibber wrote:
>> On 1/27/09 2:35 PM, Thomas Dalton wrote:
>>> The way I see it, what we need is to get a really powerful server
>> Nope, it's a software architecture issue. We'll restart it with the new
>> arch when i
On Tue, Jan 27, 2009 at 2:42 PM, Brion Vibber wrote:
> On 1/27/09 2:35 PM, Thomas Dalton wrote:
>> The way I see it, what we need is to get a really powerful server
>
> Nope, it's a software architecture issue. We'll restart it with the new
> arch when it's ready to go.
I don't know what your tim
On 1/27/09 2:35 PM, Thomas Dalton wrote:
> The way I see it, what we need is to get a really powerful server
Nope, it's a software architecture issue. We'll restart it with the new
arch when it's ready to go.
-- brion
___
Wikitech-l mailing list
Wikit
> Whether we want to let the current process continue to try and finish
> or not, I would seriously suggest someone look into redumping the rest
> of the enwiki files (i.e. logs, current pages, etc.). I am also among
> the people that care about having reasonably fresh dumps and it really
> is a p
The problem, as I understand it (and Brion may come by to correct me)
is essentially that the current dump process is designed in a way that
can't be sustained given the size of enwiki. It really needs to be
re-engineered, which means that developer time is needed to create a
new approach to dumpi
I have a decent server that is dedicated for a Wikipedia project that
depends on the fresh dumps. Can this be used anyway to speed up the process
of generating the dumps?
bilal
On Tue, Jan 27, 2009 at 2:24 PM, Christian Storm wrote:
> >> On 1/4/09 6:20 AM, yegg at alum.mit.edu wrote:
> >> The c
>> On 1/4/09 6:20 AM, yegg at alum.mit.edu wrote:
>> The current enwiki database dump
>> (http://download.wikimedia.org/enwiki/20081008/
>> ) has been crawling along since 10/15/2008.
> The current dump system is not sustainable on very large wikis and
> is being replaced. You'll hear about it
Understood--thank you. Any time-frame for when this might be launched?
On Mon, Jan 5, 2009 at 1:47 PM, Brion Vibber wrote:
> On 1/4/09 6:20 AM, y...@alum.mit.edu wrote:
>> The current enwiki database dump
>> (http://download.wikimedia.org/enwiki/20081008/) has been crawling
>> along since 10/15/
On 1/4/09 6:20 AM, y...@alum.mit.edu wrote:
> The current enwiki database dump
> (http://download.wikimedia.org/enwiki/20081008/) has been crawling
> along since 10/15/2008.
The current dump system is not sustainable on very large wikis and is
being replaced. You'll hear about it when we have the
I realize that. I'm looking forward to the the next dump :)
I had been used to a dump of that part about every 2 months, and it's
been about 3 now and the way it is headed it will be 12 before I see
another!
On Mon, Jan 5, 2009 at 9:58 AM, Russell Blau wrote:
> wrote in message
> news:1c624fe4
wrote in message
news:1c624fe40901040620g1c69d070q9f830da33e84f...@mail.gmail.com...
> The current enwiki database dump
> (http://download.wikimedia.org/enwiki/20081008/) has been crawling
> along since 10/15/2008.
...
> Is this purposeful? And is there anything I (or other community
> members)
The current enwiki database dump
(http://download.wikimedia.org/enwiki/20081008/) has been crawling
along since 10/15/2008.
I realize that dumps can appear stalled in their normal processing
(http://meta.wikimedia.org/wiki/Data_dumps#Schedule), but in the
recent past (as far as I know) they have n
30 matches
Mail list logo