I'd like to continue exploring this, just not quite sure of the appropriate way forward. I gather doing work like this on the toolforge bastion hosts is frowned upon, so I guess what I should be doing is spinning up a VPS instance on https://horizon.wikimedia.org/? <https://horizon.wikimedia.org/?> I've been reading through https://wikitech.wikimedia.org/wiki/Help:Cloud_VPS_Instances <https://wikitech.wikimedia.org/wiki/Help:Cloud_VPS_Instances>, from which I gather I need to be a project admin. Is there some existing project I can join to do this exploratory work, or should I create a new project?
> On Mar 31, 2021, at 10:35 PM, Roy Smith <r...@panix.com> wrote: > > Thanks for looking into this. I tried this again a little later, and it ran > fine. Odd that the amount of memory used depends on the number of rows. I > would expect it would stream results to stdout as they came in, but > apparently not. > > Even weirder that the 100M example runs OOM in 10s, while the 10M example > runs to completion in 36s. Could it be pre-allocating buffer space for the > number of rows it expects to ultimately get? Ugh, that would be a crazy > design, but it does seem like that's what's happening. > >> On Mar 31, 2021, at 9:47 PM, Brooke Storm <bst...@wikimedia.org >> <mailto:bst...@wikimedia.org>> wrote: >> >> >> >>> On Mar 31, 2021, at 5:18 PM, Roy Smith <r...@panix.com >>> <mailto:r...@panix.com>> wrote: >>> >>> I'm just playing around on tools-sgebastion-08. I can dump the first 1 >>> million image names about half a minute: >>> >>>> tools.spi-tools-dev:xw-join$ time mysql >>>> --defaults-file=$HOME/replica.my.cnf -h >>>> commonswiki.web.db.svc.wikimedia.cloud commonswiki_p -N -e 'select >>>> img_name from image limit 10000000 ' > /dev/null >>>> >>>> real 0m36.586s >>>> user 0m9.678s >>>> sys 0m1.324s >>> >>> but if I try 10 million, it fails: >>> >>>> tools.spi-tools-dev:xw-join$ time mysql >>>> --defaults-file=$HOME/replica.my.cnf -h >>>> commonswiki.web.db.svc.wikimedia.cloud commonswiki_p -N -e 'select >>>> img_name from image limit 100000000 ' > /dev/null >>>> Killed >>>> >>>> real 0m9.875s >>>> user 0m1.417s >>>> sys 0m1.561s >>> >>> Is there some maximum query size configured by default? The full image >>> table on commons is about 70M rows, so extrapolating from the first >>> example, something like 1 hour to move all that data. >>> >> >> That could be RAM limits on the bastion. >> Actually, scratch that, I’ve confirmed you were killed by the OOM killer on >> that bastion: >> Mar 31 23:29:17 tools-sgebastion-08 kernel: [2860588.199138] mysql invoked >> oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, >> oom_score_adj=0 >> >> -Brooke_______________________________________________ >> Wikimedia Cloud Services mailing list >> Cloud@lists.wikimedia.org <mailto:Cloud@lists.wikimedia.org> (formerly >> lab...@lists.wikimedia.org <mailto:lab...@lists.wikimedia.org>) >> https://lists.wikimedia.org/mailman/listinfo/cloud >> <https://lists.wikimedia.org/mailman/listinfo/cloud> > _______________________________________________ > Wikimedia Cloud Services mailing list > Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) > https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud