Hello Roy, depending on what you are wanting to experiment with,
https://paws.wmcloud.org/ might be a good choice. You are correct about
Cloud VPS instances. If your work becomes its own project, you are more
than welcome to request a new project on Cloud VPS.

On Thu, Apr 1, 2021 at 10:26 AM Roy Smith <r...@panix.com> wrote:

> I'd like to continue exploring this, just not quite sure of the
> appropriate way forward.  I gather doing work like this on the toolforge
> bastion hosts is frowned upon, so I guess what I should be doing is
> spinning up a VPS instance on https://horizon.wikimedia.org/?  I've been
> reading through
> https://wikitech.wikimedia.org/wiki/Help:Cloud_VPS_Instances, from which
> I gather I need to be a project admin.  Is there some existing project I
> can join to do this exploratory work, or should I create a new project?
>
> On Mar 31, 2021, at 10:35 PM, Roy Smith <r...@panix.com> wrote:
>
> Thanks for looking into this.  I tried this again a little later, and it
> ran fine.  Odd that the amount of memory used depends on the number of
> rows.  I would expect it would stream results to stdout as they came in,
> but apparently not.
>
> Even weirder that the 100M example runs OOM in 10s, while the 10M example
> runs to completion in 36s.  Could it be pre-allocating buffer space for the
> number of rows it expects to ultimately get?  Ugh, that would be a crazy
> design, but it does seem like that's what's happening.
>
> On Mar 31, 2021, at 9:47 PM, Brooke Storm <bst...@wikimedia.org> wrote:
>
>
>
> On Mar 31, 2021, at 5:18 PM, Roy Smith <r...@panix.com> wrote:
>
> I'm just playing around on tools-sgebastion-08.  I can dump the first 1
> million image names about half a minute:
>
> tools.spi-tools-dev:xw-join$ time mysql
> --defaults-file=$HOME/replica.my.cnf -h
> commonswiki.web.db.svc.wikimedia.cloud commonswiki_p -N -e 'select img_name
> from image limit 10000000 ' > /dev/null
>
> real    0m36.586s
> user    0m9.678s
> sys     0m1.324s
>
>
> but if I try 10 million, it fails:
>
> tools.spi-tools-dev:xw-join$ time mysql
> --defaults-file=$HOME/replica.my.cnf -h
> commonswiki.web.db.svc.wikimedia.cloud commonswiki_p -N -e 'select img_name
> from image limit 100000000 ' > /dev/null
> Killed
>
> real    0m9.875s
> user    0m1.417s
> sys     0m1.561s
>
>
> Is there some maximum query size configured by default?  The full image
> table on commons is about 70M rows, so extrapolating from the first
> example, something like 1 hour to move all that data.
>
>
> That could be RAM limits on the bastion.
> Actually, scratch that, I’ve confirmed you were killed by the OOM killer
> on that bastion:
> Mar 31 23:29:17 tools-sgebastion-08 kernel: [2860588.199138] mysql invoked
> oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0,
> oom_score_adj=0
>
> -Brooke_______________________________________________
> Wikimedia Cloud Services mailing list
> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/cloud
>
>
> _______________________________________________
> Wikimedia Cloud Services mailing list
> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/cloud
>
>
> _______________________________________________
> Wikimedia Cloud Services mailing list
> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/cloud
>


-- 
*Nicholas Skaggs*
Engineering Manager, Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud

Reply via email to