Hello Roy, depending on what you are wanting to experiment with, https://paws.wmcloud.org/ might be a good choice. You are correct about Cloud VPS instances. If your work becomes its own project, you are more than welcome to request a new project on Cloud VPS.
On Thu, Apr 1, 2021 at 10:26 AM Roy Smith <r...@panix.com> wrote: > I'd like to continue exploring this, just not quite sure of the > appropriate way forward. I gather doing work like this on the toolforge > bastion hosts is frowned upon, so I guess what I should be doing is > spinning up a VPS instance on https://horizon.wikimedia.org/? I've been > reading through > https://wikitech.wikimedia.org/wiki/Help:Cloud_VPS_Instances, from which > I gather I need to be a project admin. Is there some existing project I > can join to do this exploratory work, or should I create a new project? > > On Mar 31, 2021, at 10:35 PM, Roy Smith <r...@panix.com> wrote: > > Thanks for looking into this. I tried this again a little later, and it > ran fine. Odd that the amount of memory used depends on the number of > rows. I would expect it would stream results to stdout as they came in, > but apparently not. > > Even weirder that the 100M example runs OOM in 10s, while the 10M example > runs to completion in 36s. Could it be pre-allocating buffer space for the > number of rows it expects to ultimately get? Ugh, that would be a crazy > design, but it does seem like that's what's happening. > > On Mar 31, 2021, at 9:47 PM, Brooke Storm <bst...@wikimedia.org> wrote: > > > > On Mar 31, 2021, at 5:18 PM, Roy Smith <r...@panix.com> wrote: > > I'm just playing around on tools-sgebastion-08. I can dump the first 1 > million image names about half a minute: > > tools.spi-tools-dev:xw-join$ time mysql > --defaults-file=$HOME/replica.my.cnf -h > commonswiki.web.db.svc.wikimedia.cloud commonswiki_p -N -e 'select img_name > from image limit 10000000 ' > /dev/null > > real 0m36.586s > user 0m9.678s > sys 0m1.324s > > > but if I try 10 million, it fails: > > tools.spi-tools-dev:xw-join$ time mysql > --defaults-file=$HOME/replica.my.cnf -h > commonswiki.web.db.svc.wikimedia.cloud commonswiki_p -N -e 'select img_name > from image limit 100000000 ' > /dev/null > Killed > > real 0m9.875s > user 0m1.417s > sys 0m1.561s > > > Is there some maximum query size configured by default? The full image > table on commons is about 70M rows, so extrapolating from the first > example, something like 1 hour to move all that data. > > > That could be RAM limits on the bastion. > Actually, scratch that, I’ve confirmed you were killed by the OOM killer > on that bastion: > Mar 31 23:29:17 tools-sgebastion-08 kernel: [2860588.199138] mysql invoked > oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, > oom_score_adj=0 > > -Brooke_______________________________________________ > Wikimedia Cloud Services mailing list > Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) > https://lists.wikimedia.org/mailman/listinfo/cloud > > > _______________________________________________ > Wikimedia Cloud Services mailing list > Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) > https://lists.wikimedia.org/mailman/listinfo/cloud > > > _______________________________________________ > Wikimedia Cloud Services mailing list > Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) > https://lists.wikimedia.org/mailman/listinfo/cloud > -- *Nicholas Skaggs* Engineering Manager, Cloud Services Wikimedia Foundation <https://wikimediafoundation.org/>
_______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud