I'd like to continue exploring this, just not quite sure of the appropriate way 
forward.  I gather doing work like this on the toolforge bastion hosts is 
frowned upon, so I guess what I should be doing is spinning up a VPS instance 
on https://horizon.wikimedia.org/? <https://horizon.wikimedia.org/?>  I've been 
reading through https://wikitech.wikimedia.org/wiki/Help:Cloud_VPS_Instances 
<https://wikitech.wikimedia.org/wiki/Help:Cloud_VPS_Instances>, from which I 
gather I need to be a project admin.  Is there some existing project I can join 
to do this exploratory work, or should I create a new project?

> On Mar 31, 2021, at 10:35 PM, Roy Smith <r...@panix.com> wrote:
> 
> Thanks for looking into this.  I tried this again a little later, and it ran 
> fine.  Odd that the amount of memory used depends on the number of rows.  I 
> would expect it would stream results to stdout as they came in, but 
> apparently not.
> 
> Even weirder that the 100M example runs OOM in 10s, while the 10M example 
> runs to completion in 36s.  Could it be pre-allocating buffer space for the 
> number of rows it expects to ultimately get?  Ugh, that would be a crazy 
> design, but it does seem like that's what's happening.
> 
>> On Mar 31, 2021, at 9:47 PM, Brooke Storm <bst...@wikimedia.org 
>> <mailto:bst...@wikimedia.org>> wrote:
>> 
>> 
>> 
>>> On Mar 31, 2021, at 5:18 PM, Roy Smith <r...@panix.com 
>>> <mailto:r...@panix.com>> wrote:
>>> 
>>> I'm just playing around on tools-sgebastion-08.  I can dump the first 1 
>>> million image names about half a minute:
>>> 
>>>> tools.spi-tools-dev:xw-join$ time mysql  
>>>> --defaults-file=$HOME/replica.my.cnf -h 
>>>> commonswiki.web.db.svc.wikimedia.cloud commonswiki_p -N -e 'select 
>>>> img_name from image limit 10000000 ' > /dev/null
>>>> 
>>>> real    0m36.586s
>>>> user    0m9.678s
>>>> sys     0m1.324s
>>> 
>>> but if I try 10 million, it fails:
>>> 
>>>> tools.spi-tools-dev:xw-join$ time mysql  
>>>> --defaults-file=$HOME/replica.my.cnf -h 
>>>> commonswiki.web.db.svc.wikimedia.cloud commonswiki_p -N -e 'select 
>>>> img_name from image limit 100000000 ' > /dev/null
>>>> Killed
>>>> 
>>>> real    0m9.875s
>>>> user    0m1.417s
>>>> sys     0m1.561s
>>> 
>>> Is there some maximum query size configured by default?  The full image 
>>> table on commons is about 70M rows, so extrapolating from the first 
>>> example, something like 1 hour to move all that data.
>>> 
>> 
>> That could be RAM limits on the bastion. 
>> Actually, scratch that, I’ve confirmed you were killed by the OOM killer on 
>> that bastion:
>> Mar 31 23:29:17 tools-sgebastion-08 kernel: [2860588.199138] mysql invoked 
>> oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, 
>> oom_score_adj=0
>> 
>> -Brooke_______________________________________________
>> Wikimedia Cloud Services mailing list
>> Cloud@lists.wikimedia.org <mailto:Cloud@lists.wikimedia.org> (formerly 
>> lab...@lists.wikimedia.org <mailto:lab...@lists.wikimedia.org>)
>> https://lists.wikimedia.org/mailman/listinfo/cloud 
>> <https://lists.wikimedia.org/mailman/listinfo/cloud>
> _______________________________________________
> Wikimedia Cloud Services mailing list
> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/cloud

_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud

Reply via email to