[Cloud] Re: [Cloud-announce] No Friday plans? Spend your weekend replacing Buster VMs!

2024-06-14 Thread AntiCompositeNumber
The steps to migrate a server are very project-dependent. If the
project has good documentation about how to install their software,
where their data is stored, where the configuration is and what it
does, etc then someone with Linux experience can usually reinstall the
project on a new server fairly quickly. If that information doesn't
exist then migration by anyone other than the original maintainer will
be very difficult.

I will say that an OS change is a good time to onboard new maintainers.

AntiCompositeNumber
(they/him)

On Fri, Jun 14, 2024 at 2:50 PM mdsiam O  wrote:
>
> তোর মা কে চুদিম
>
> On Sat, Jun 15, 2024, 12:27 AM Andrew Bogott  wrote:
>>
>> Cloud-vps users:
>>
>> There are now a mere two weeks remaining before Debian Buster ends its 
>> period of long term support. After June 30th, security upgrades will no 
>> longer be available for this release and VMs running Buster will become ever 
>> more risky and difficult to maintain.
>>
>> As of today there are still 143 Buster servers running in our cloud[0] -- 
>> some of them are probably yours! Please take some time to delete VMs that 
>> are no longer needed, and rebuild those that are still needed with a more 
>> modern release, ideally Debian Bookworm.
>>
>> There is a task for your project on phabricator[1] where you can update your 
>> progress. If you have vital VMs that you absolutely cannot rebuild by July 
>> 15th, please update the associated task with your plan and anticipated 
>> timeline. WMCS staff will start shutting down unacknowledged VMs in mid July 
>> in order to attract the attention of users who do not read email or follow 
>> phabricator.
>>
>> Buster's end of life has been a long time coming, and frequently announced. 
>> If you've been waiting for the right time to think about this, the time is 
>> now.
>>
>> Thank you!
>>
>> -Andrew + WMCS staff
>>
>>
>> [0] https://os-deprecation.toolforge.org/
>>
>> [1] https://phabricator.wikimedia.org/project/view/6373/
>>
>> ___
>> Cloud-announce mailing list -- cloud-annou...@lists.wikimedia.org
>> List information: 
>> https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.org/
>
> ___
> Cloud mailing list -- cloud@lists.wikimedia.org
> List information: 
> https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
___
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/


[Cloud] Re: Changes in schema of pagelinks tables

2024-01-17 Thread AntiCompositeNumber
Changing queries to support a new database format is one thing.
Writing migration code to deal with a situation that should not exist
(columns being dropped before the migration is completed) is another.
I suppose I am lucky in that the only tool I maintain that queries the
pagelinks table is single-wiki.

AntiCompositeNumber
(he/him)

On Wed, Jan 17, 2024 at 3:05 PM Amir Sarabadani
 wrote:
>
> Hi!
>
> Am Mi., 17. Jan. 2024 um 19:37 Uhr schrieb Ben Kurtovic 
> :
>>
>> Hi Amir & others,
>>
>> I’m glad we are making changes to improve DB storage/query efficiency. I 
>> wanted to express my agreement with Tacsipacsi that dropping the data before 
>> the migration has completed is a really bad outcome. Now tool maintainers 
>> need to deal with multiple migrations depending on the wikis they query or 
>> add more code complexity. And there is little time to make the changes for 
>> those of us who had planned to wait until the new data was available.
>
>
> I totally understand the frustration. In my volunteer capacity, I also 
> maintain numerous tools and they break every now and then because of changes.
>>
>>
>> > Commons has grown to 1.8TB already
>>
>> That’s a big number yes, but it doesn’t really answer the question — is the 
>> database actually about to fill up?
>
>
> It's a bit more nuanced. We are not hitting limits on the storage. But the 
> memory for data cache on each replica is about 350GB and we need to serve 
> almost everything from memory since the disk is 1000 times slower than 
> memory. If we read too much from disk, reads start to pile up, leading the 
> appserver requests starting to pile up and general outage happening  (which 
> has happened before with wikidata's database). You can have a 3TB database 
> with only 100GB of "hot" data and you'd be fine but Commons is both big and 
> very heavily read and across its tables and rows. Ratio-wise, Commons 
> database is already reading twice as much as English Wikipedia from disk.
>
>> How much time do you have until that happens and how much time until s1/s8 
>> finish their migration?
>
> The database is already in the "fragile" and "high risk" state. I can't give 
> you an exact date when it'll go down but due to reasons mentioned above I can 
> tell you that even now with any noticeable increase in its traffic or sudden 
> shift in its read patterns it will go down and bring all wikis down with it.  
> There are already user-facing parts in commons that shouldn't be slow but 
> they are due to excessive read from disk.
>
> Also, for the case of Wikidata, it might take a long time, possibly three 
> more months, to finish due to its unique pagelinks usage pattern because of 
> scholarly articles.
>
>> Is there a reason you can share why this work wasn’t started earlier if the 
>> timing is so close?
>
> We have been constantly working to reduce its size in the past several years, 
> templatelinks migration, externallinks redesign, and so on has been done back 
> to back (started in 2021. We even bumped priority of externallinks migration 
> because of Commons only) but at the same time, the wiki has been growing way 
> too fast. (Emphasizing that the growth doesn't have much to do with images 
> being uploaded, the image table is only 100GB, the problem is the overly 
> large links tables, including templatelinks being 270GB, categorylinks being 
> 200GB, pagelinks being 190GB and so on.). This has put us into a red queen 
> situation with no easy way out.
>>
>>
>> > so you need to use the old way for the list of thirty-ish wikis (s1, s2, 
>> > s6, s7, s8) and for any wiki not a member of that, you can just switch to 
>> > the new system
>>
>> IMO, a likely outcome is some tools/bots will simply be broken on a subset 
>> of wikis until the migtation is completed across all DBs.
>
>
> The most urgent one is Commons. What about only dropping it from Commons to 
> reduce the risk of outage and leave the rest until the all are finished (or 
> all except Wikidata)? You'd have to write something for the new schema 
> regardless.
>>
>>
>> Thanks for all your work on this task so far.
>
>
> Thank you and sorry for the inconvenience.
>>
>>
>> Ben / Earwig
>
> ___
> Cloud mailing list -- cloud@lists.wikimedia.org
> List information: 
> https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
___
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/


[Cloud] Re: VideoWiki direct access questions

2024-01-12 Thread AntiCompositeNumber
Here is the Phab task for the floating IP request:
https://phabricator.wikimedia.org/T300750.
Declining to grant a floating IP in this case is in line with how WMCS
generally handles floating IPs in my experience, and the reasons given
are valid.

AntiCompositeNumber
(he/him)

On Fri, Jan 12, 2024 at 1:04 PM Tim Moody  wrote:
>
> VideoWiki is a project of WikiProjectMed aimed at disseminating wikipedia 
> content visually, rather than textually, by generating a video from the text 
> and images of a wiki page. It is currently deployed at 
> https://videowiki.wmcloud.org. (It is not fully operational yet.)
>
> We asked for a static IP address to access it from videowiki.org, but were 
> denied. So I have two questions:
>
> 1) Is there community support for granting this project a static IP to 
> facilitate this access?
>
> 2) If not, is there a recommended way to use dynamic dns, such as noip, to 
> cause videowiki.org to resolve as videowiki.wmcloud.org? I think most dyndns 
> clients assume the IP of the host is the target rather than a proxy, though I 
> suppose the proxy could be pinged and that address supplied periodically to 
> the dyndns service.
>
> Any help would be appreciated,
>
> Tim
>
>
> ___
> Cloud mailing list -- cloud@lists.wikimedia.org
> List information: 
> https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
___
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/


[Cloud] Re: [Cloud-announce] Updated WMCS terms of use

2023-06-05 Thread AntiCompositeNumber
In case it was unclear: I do not intentionally store usernames. The
Toolforge infrastructure, in its default configuration, stores
usernames. Toolforge also does not provide tools to easily clean up
logs.

This is not an individual developer problem.

AntiCompositeNumber
(he/him)

On Mon, Jun 5, 2023 at 6:04 PM Maciej Jaros  wrote:
>
> AntiCompositeNumber (2023-05-27 05:29):
> > I am disappointed that these Terms went into effect immediately,
> > without any chance for review or comment by the community. This is
> > counter to how Wikimedia processes should run, and flies in the face
> > of the values of the Wikimedia movement.
> >
> > I am concerned about some of the provisions of these Terms. For
> > example, 7.3 bullet 3 states
> >> Not collect any other Personal Information and Wikimedia Usernames from 
> >> End Users, other than any user agent information forwarded by the 
> >> anonymizing reverse proxy or OAuth provided usernames and email addresses.
> > One of my tools, signatures.toolforge.org, provides data on a user's
> > signature from their username. The queried username is included in the
> > path, and is logged by the default uwsgi logging configuration. It is
> > likely that at least some End Users will check their own usernames, so
> > therefore the tool is collecting Wikimedia Usernames from End Users.
> >
> > This *shouldn't* be a violation of the Terms, but by a plain reading
> > of them, it is.
>
> I am not a lawyer, but as a developer with some GDPR experience (EU Law)
> -- storing usernames, especially indefinitely, in logs doesn't seem
> acceptable to me. You shouldn't store user data you don't need. And if
> you store user data, you should allow the user to object and provide a
> procedure to delete this data. In general you should probably remove
> that data as soon as you do not needed.
>
> An additional complication could be if you process user data in
> conjunction with IP and create some statistics, because then it may fall
> under profiling.
>
> Kind regards,
> Nux.
> ___
> Cloud mailing list -- cloud@lists.wikimedia.org
> List information: 
> https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
___
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/


[Cloud] Re: [Cloud-announce] Updated WMCS terms of use

2023-06-02 Thread AntiCompositeNumber
I'm sorry, but that response is unacceptable.

The resources of volunteer developers are also limited. Enacting an
unclear policy without recent consultation with the groups affected
places an unreasonable burden on the volunteers WMCS exists to serve.

AntiCompositeNumber
(he/him)

On Fri, Jun 2, 2023 at 11:45 AM Andrew Bogott  wrote:
>
> Xover, ACN, and others,
>
> I’m sorry for my slow response – much of the WMCS staff was off for some of 
> this week.
>
> In 2016 there was an initial public consultation about new terms of use[0]. 
> The fact that we only now have a document to show for it should give you an 
> idea of how difficult this process is.
>
> I would definitely prefer to have further community review and discussion, 
> but we simply aren't resourced for that. To commit to another round of 
> community review would mean setting this long-needed update aside for more 
> additional years. As I hate to let the perfect be the enemy of the good, I've 
> elected to do what's possible rather than hold out for a process that I do 
> not believe to be possible with current resourcing.
>
> We may be able to make minor adjustments in wording to the document, but I 
> remain convinced that this document is an improvement over the previous 
> version (among other things, because it has buy-in from legal which the 
> former document lacked).
>
> That said, I do not mean to entirely stifle discussion on this topic. We need 
> to keep our expectations as low as possible, but I nevertheless encourage 
> those with concerns to comment on the respective talk pages so that we have a 
> record for if/when we have the resources to revise the document.
>
> -Andrew
>
>
> [0] https://meta.wikimedia.org/wiki/Labs_TOU_Consultation_Round_1_(2016)
>
>
> On 5/27/23 4:39 AM, xo...@pobox.com wrote:
>
> I agree with ACN's points, especially the lack of community review and 
> discussion.
>
> For example, in section 4.1 on prohibited uses it refers to "Do not break the 
> law", but fails to specify which law, so as written it applies to all laws 
> anywhere enacted by a competent legislative body. Congratulations: it's now a 
> breach of the WMCS ToS to criticise repressive regimes, comparing heads of 
> state to literary figures, and any number of things the suppression of which 
> is in contravention of the movement values.
>
> Section 6.2 says "if WMCS administrators fail to reach you within six (6) 
> weeks". That's a pretty onerous time limit for volunteers. Being busy IRL, or 
> in hospital, or… for six weeks is not uncommon and in no way indicates a 
> stable tool is abandoned. Needing to take emergency measures more quickly 
> (like shutting down the VMs) for security issues or the like is an orthogonal 
> concern.
>
> 7.2 says Toolforge projects (but not other projects for some reason) must 
> "Use any user agent information … only for the maintenance of your Toolforge 
> Project". Maintenance of the project does not include content negotiation, 
> progressive enhancement, and other functional aspects. Depending on what 
> definition you apply to "user agent information" (since "user agent" is not 
> defined anywhere) this could include authentication headers for e.g. Basic 
> auth, or just the HTTP User-Agent header field, or any information about the 
> user agent (like screen resolution, technical capabilities, supported content 
> types or javascript features).
>
> Section 7.3.1 requires all projects that collect personal information to post 
> a privacy policy (and other things). Since section 2 (definitions) defines 
> "user agent" to be personal information equivalent to your password, social 
> security number, real name, and bank account number and information about the 
> user agent is provided to all projects by the anonymising proxy, all projects 
> are by definition collecting personal information. All projects with a web 
> interface are thus required to post a full privacy policy. The definition of 
> "End User" does not exclude the developer / project admin, so all projects 
> without a web interface are also required to post a full privacy policy. If 
> all projects are actually required to post a privacy policy it would be much 
> much simpler to have the policy just say "All projects must post a privacy 
> policy".
>
> There is no definition of "collecting" so what technical operations actually 
> constitute "collecting personal information" is unclear.
>
> There is no definition of "user agent" so it is unclear whether it is 
> intended to encompass all information provided by the user's User Agent (i.e. 
> web browser), all information _about_ 

[Cloud] Re: [Cloud-announce] Updated WMCS terms of use

2023-05-26 Thread AntiCompositeNumber
I am disappointed that these Terms went into effect immediately,
without any chance for review or comment by the community. This is
counter to how Wikimedia processes should run, and flies in the face
of the values of the Wikimedia movement.

I am concerned about some of the provisions of these Terms. For
example, 7.3 bullet 3 states
> Not collect any other Personal Information and Wikimedia Usernames from End 
> Users, other than any user agent information forwarded by the anonymizing 
> reverse proxy or OAuth provided usernames and email addresses.

One of my tools, signatures.toolforge.org, provides data on a user's
signature from their username. The queried username is included in the
path, and is logged by the default uwsgi logging configuration. It is
likely that at least some End Users will check their own usernames, so
therefore the tool is collecting Wikimedia Usernames from End Users.

This *shouldn't* be a violation of the Terms, but by a plain reading
of them, it is.

I am also concerned that
https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use#5._Limited_use_of_third-party_resources_allowed
makes reference to a non-existent policy and refers to itself with a
different title.

I am also disappointed that the revised Terms still require tools to
be under an OSI-compliant license, without permitting the use of CC-0
or public domain grants. The requirement to request and be granted an
exemption to run one-off scripts without releasing them also seems too
arduous to be useful. Either free licenses should be required for
everything, or the approval requirement should be dropped.

The warning at the top should also make clear that developer email
addresses are public to the Internet, not merely to other WMCS users
(for example, at <https://ldap.toolforge.org/user/anticomposite>).

The overall layout of the Terms is also confusing, with very short
sections referring to other very short sections on the other side of
the document.

AntiCompositeNumber
(he/him)

On Fri, May 26, 2023 at 9:46 AM Andrew Bogott  wrote:
>
> After nearly a decade of mishap and delay, we have updated the WMCS
> terms of use. The updated document for toolforge and cloud-vps admins
> can be found here:
>
> https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use
>
>
> and the terms of use for visitors to WMCS sites can be found here:
>
> https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_End_User_Terms_of_use
>
>
> There is one significant change in these terms: Cloud-vps projects which
> collect personal data will need to include an explicit privacy policy
> for their projects. This is section 7.3. For other WMCS users and admins
> these documents do not represent any significant change in policy, but
> do clarify and finalize many things that were poorly-worded in the
> previous TOU, or policies that we have enforced informally without
> officially stating.
>
> Please feel free to reach out to WMCS staff if you find any part of
> these documents concerning or disruptive to your work on our platforms.
>
>
> -Andrew
>
>
>
>
>
>
> ___
> Cloud-announce mailing list -- cloud-annou...@lists.wikimedia.org
> List information: 
> https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.org/
> ___
> Cloud mailing list -- cloud@lists.wikimedia.org
> List information: 
> https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
___
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/


[Cloud] Re: Proposing a Tool sweep

2022-12-29 Thread AntiCompositeNumber
It's a recurring problem. Tool Y becomes popular, then becomes
unmaintained. Other developers/WMCS/Toolforge roots/Standards Committee
investigate, and find that everything is All Rights Reserved. Then the bot
task/tool is dead until someone rewrites it, often years later.

Happened most recently with the SanFranBan of جار الله / JarBot, but that's
definitely not an isolated incident.

ACN

On Thu, Dec 29, 2022 at 11:41 Maarten Dammers  wrote:

> Hi Kunal,
>
> Can you explain here and/or on
> https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Tool_sweep why you
> want to do this?
>
> Maarten
>
> On 29-12-2022 10:02, Kunal Mehta wrote:
> > Hi everyone,
> >
> > tl;dr: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Tool_sweep
> >
> > It has been nearly a decade since Toolforge came online. Since then,
> > there have a been a lot of improvements to tool infrastructure, but
> > many tools have not yet caught up. For example, new tools are required
> > to have metadata in toolsadmin, and unused tools can now be
> > archived/deleted.
> >
> > I am proposing that we "sweep" through all tools, checking each one for:
> > * Indicating a OSI-approved license in the source code/metadata
> > * Having source code published somewhere
> > * Not loading external resources (for web applications)
> > * Having tool information and metadata in Toolsadmin or Toolhub
> >
> > This is explained further on the wiki page, with proposed remediation
> > steps.
> >
> > There are roughly ~3,200 tools, if we split it up into batches by
> > month, it's about 250 tools per month. Depending on how many people
> > are interested sweeping, it could be doable :) ...or it might take
> > multiple years to clear through.
> >
> > Ultimately the goal is to support tool maintainers with bringing their
> > tools up to standard rather than criticizing them for not doing so.
> >
> > If you're interested in participating, please add your name to the
> > wiki page :) I would like to kick this off in the first week of January.
> >
> > Please let me know if you have any concerns, questions or suggestions.
> >
> > Thanks,
> > -- Kunal / Legoktm
> > ___
> > Cloud mailing list -- cloud@lists.wikimedia.org
> > List information:
> > https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
> ___
> Cloud mailing list -- cloud@lists.wikimedia.org
> List information:
> https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
>
-- 
AntiCompositeNumber
(he/him)
___
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/


[Cloud] Re: Kubernetes-based jobs engine -- how to use Python virtual environments?

2022-04-06 Thread AntiCompositeNumber
In general, if you are using a Python virtual environment on k8s, you
should only modify it from a shell in a matching k8s container. The
easiest way to do this is with `webservice --backend=kubernetes
python3.9 shell` (or whatever version you are using). If the python
version you're using matches what's in use on the Buster bastion, it
should work, but using a k8s shell will be the most reliable.

On Tue, Apr 5, 2022 at 5:16 AM David Caro  wrote:
>
>
> You can use the one used for k8s on the bastion too, the thing is that the 
> k8s environment expects it to be in a
> specific path, that is `$HOME/pyenv/`.
>
> Now, that might not work if the python version in the bastion is too 
> different from the one the k8s job has, so you
> might have to have two different ones in that case yes (bastions are 
> restricted to the python version of the OS, the k8s
> jobs can use different ones depending on the image).
>
>
> On 04/05 11:08, Martin Urbanec wrote:
> > Hello everyone,
> >
> > Thanks for the suggestion. Does this mean that I have to maintain 2 virtual
> > environments (one for bastion and the other one for k8s pod my code will
> > actually run in)?
> >
> > Martin
> >
> > po 4. 4. 2022 v 10:06 odesílatel Arturo Borrero Gonzalez <
> > aborr...@wikimedia.org> napsal:
> >
> > >
> > >
> > > On 4/2/22 14:53, Martin Urbanec wrote:
> > > > Hello,
> > > >
> > > > I just received a dozen emails about grid engine migration. I tried to
> > > > migrate my personal tool (tool.martin-urbanec) first. This tool
> > > > currently generates a Jupyter-notebook based report daily.
> > > >
> > > > I do that by calling jupyter nbconvert --to html --execute
> > > > community_configuration_usage.ipynb from a virtual environment where
> > > > Jupyter is installed, together with a couple of other Python modules.
> > > >
> > > > I managed to create new virtual environment that works from the new
> > > > Buster bastion, and it works when executed directly from the bastion,
> > > > but I can't get it to execute via the k8s-based engine:
> > >
> > > Your problem may be related to bootstrapping the venv. See if this
> > > information can help you:
> > >
> > >
> > > https://wikitech.wikimedia.org/wiki/Help:Toolforge/Python#Kubernetes_python_jobs
> > >
> > > This is very similar to what JMC89 replied in the other email.
> > >
> > > --
> > > Arturo Borrero Gonzalez
> > > Site Reliability Engineer
> > > Wikimedia Cloud Services
> > > Wikimedia Foundation
> > >
>
> > ___
> > Cloud mailing list -- cloud@lists.wikimedia.org
> > List information: 
> > https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
>
>
> --
> David Caro
> SRE - Cloud Services
> Wikimedia Foundation 
> PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3
>
> "Imagine a world in which every single human being can freely share in the
> sum of all knowledge. That's our commitment."
> ___
> Cloud mailing list -- cloud@lists.wikimedia.org
> List information: 
> https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
___
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/


[Cloud] Re: [Cloud-announce] Toolforge: need to stop webservice and start it again following changes to labels

2021-10-14 Thread AntiCompositeNumber
Webservice will try to pull settings from the service.manifest file,
but the best practice is to use a webservice template file
.
This is something all tool operators running a webservice should do,
as it prevents you from forgetting parameters. With a service.template
file, all you have to do is `webservice start`.

ACN

On Thu, Oct 14, 2021 at 9:39 AM Arthur Smith  wrote:
>
> Hi - I don't think 'webservice start` is necessarily the right command to 
> start up services - doesn't that just default to a php server? If you're 
> running python or ruby or something else you need to run whatever the 
> webservice startup command is you initially ran. The service.manifest file 
> should have details if you still have that from the previous service startup.
>
>Arthur
>
> On Wed, Oct 13, 2021 at 5:49 PM Brooke Storm  wrote:
>>
>> If you were running a Toolforge web tool in Kubernetes before the 
>> toollabs-webservice label changes were deployed on 2021-09-29 
>> (https://sal.toolforge.org/tools?d=2021-09-29). You may need to run 
>> `webservice stop && webservice start` in order to ensure your replica sets 
>> have correct label expectations on them going forward. Otherwise you may 
>> find confusing states may happen when running webservice restart and similar 
>> commands.
>>
>> When I backfilled the new labels, I missed that you cannot change the label 
>> matching rules in a deployment retroactively. I apologize for any 
>> inconvenience.
>>
>> In summary: If you haven’t run a webservice stop since 2021-09-29 on your 
>> Kubernetes web service, it would be a good idea to stop and start your 
>> webservice now to prevent any confusing behavior from webservice in the 
>> future.
>>
>> --
>> Brooke Storm
>> Staff SRE
>> Wikimedia Cloud Services
>> bst...@wikimedia.org
>>
>>
>>
>> ___
>> Cloud-announce mailing list -- cloud-annou...@lists.wikimedia.org
>> List information: 
>> https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.org/
>> ___
>> Cloud mailing list -- cloud@lists.wikimedia.org
>> List information: 
>> https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
>
> ___
> Cloud mailing list -- cloud@lists.wikimedia.org
> List information: 
> https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
___
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/


[Cloud] Re: Python upgrade on the Toolforge

2021-09-26 Thread AntiCompositeNumber
The grid and bastions only support Python 3.5; that won't change until
they get an OS upgrade (https://phabricator.wikimedia.org/T275864). As
you noted, python 3.7 and 3.9 are available on Kubernetes but writing
YAML configuration can be complex. There is a project to build a tool
similar to jsub for submitting jobs to the Kubernetes cluster. A beta
version of that tool is available for testing on the dev-buster
bastion, see  for details.
In this case, beta means "might not have all the features" and
"implementation details might change". Because it's just another way
to submit jobs to the existing k8s cluster, it will still be at least
as reliable as the grid.

On Sun, Sep 26, 2021 at 9:44 PM Huji Lee  wrote:
>
> Hi all,
>
> When I run python3 --version on Toolforge I see version 3.5.3 is installed. 
> Because python 3.5 reached the end of its life in September 2020, pip is 
> really unhappy about that.
>
> Is there a way to use a later version of python3 on Toolforge? If not, are 
> there plans to upgrade the OS and upgrade python with it?
>
> PS: I know from Help:Toolforge/Python that 3.7.3 is available on Kubernetes, 
> but I am dealing with scripts that are submitted via jsub and not k8s, and 
> the overhead of converting them is prohibitive.
>
> Thanks!
> ___
> Cloud mailing list -- cloud@lists.wikimedia.org
> List information: 
> https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
___
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/


[Cloud] Re: Validating multiple usernames?

2021-09-09 Thread AntiCompositeNumber
Yes, the canonical form of usernames is with spaces, but the canonical
form of page titles is with underscores.

No this hasn't ever confused anyone or caused me any problems, why do you ask?

ACN

On Thu, Sep 9, 2021 at 10:56 AM Roy Smith  wrote:
>
> That can't be right.  I think you meant, "Reduce any runs of multiple 
> underscores to a single SPACE" and then "Trim any leading or trailing spaces"
>
> On Sep 6, 2021, at 12:15 AM, Bryan Davis  wrote:
>
> * Replace all whitespace characters with underscores (`_`)
> * Reduce any runs of multiple underscores to a single underscore
> * Trim any leading or trailing underscores from the string
> * Capitalize the string
>
>
> ___
> Cloud mailing list -- cloud@lists.wikimedia.org
> List information: 
> https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
___
Cloud mailing list -- cloud@lists.wikimedia.org
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/


Re: [Cloud] [Cloud-announce] Upcoming cloud-vps project deletions

2021-01-26 Thread AntiCompositeNumber
Fastcci is used by a default gadget on Commons, and is being
maintained by the recently-returned User:Dschwen.

On Tue, Jan 26, 2021 at 2:17 PM Andrew Bogott  wrote:
>
> Hello!
>
> The 2020 project opt-in process wrapped up at the end of the year, and
> we've identified the following projects as abandoned:
>
>
> - asyncwiki
> - blog
> - commons-corruption-checker
> - fastcci
> - finding-glams
> - ign2commons
> - lizenzhinweisgenerator
> - lta-tracker
> - meza
> - ogvjs-integration
> - puppet
> - snuggle
> - wikidata-federation
> - wikidata-primary-sources-tool
> - wikidata-realtime-dumps
> - wikimania-scholarships
>
>
> At the end of this month (2020-01-31) those projects will be deleted
> along with all related data and VMs. If you know of anyone associated
> with those projects who is not on this list, please bring this to their
> attention. And, if you think any of this is in error, please notify me
> immediately.
>
> Thank you!
>
> -Andrew + the WMCS team
>
>
>
> ___
> Wikimedia Cloud Services announce mailing list
> cloud-annou...@lists.wikimedia.org (formerly 
> labs-annou...@lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/cloud-announce
> ___
> Wikimedia Cloud Services mailing list
> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/cloud

___
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud


Re: [Cloud] [Cloud-announce] Wiki Replicas 2020 Redesign

2020-11-17 Thread AntiCompositeNumber
I took a look at converting the query used for GreenC Bot's Job 10,
which tracks enwiki files that "shadow" a different file on Commons.
It is currently run daily, and the query executes in about 60-90
seconds. I tried three methods to recreate that query without a SQL
cross-database join. The naive method of "just give me all the files"
didn't work because it timed out somewhere. The paginated version of
that query was on track to take over 5 hours to complete. A similar
method that emulates a subquery instead of a join was projected to
take about 6 hours. Both stopped early because I got bored of watching
them and PAWS doesn't work unattended. I also wasn't able to properly
test them because people kept fixing the shadowed files before the
script got to them. The code is at
.

ACN

On Tue, Nov 17, 2020 at 1:02 PM Maarten Dammers  wrote:
>
> Hi Joaquin,
>
> On 16-11-2020 21:42, Joaquin Oltra Hernandez wrote:
>
> Hi Maarten,
>
> I believe this work started many years ago, and it was paused, and recently 
> restarted because of the stability and performance problems in the last years.
>
> You do realize the current setup was announced as new 3 years ago? See 
> https://phabricator.wikimedia.org/phame/post/view/70/new_wiki_replica_servers_ready_for_use/
>  .
>
> I'm sorry about the extra work this will cause, I hope the improved stability 
> and performance will make it worth it for you, and that you will reconsider 
> and migrate your code to work on the new architecture (or reach out for 
> specific help if you need it).
>
> No, saying sorry won't make it right and no, it won't make it worth it for 
> me. If I want very stable access to a single wiki, I'll use the API of that 
> wiki.
>
> --
> Joaquin Oltra Hernandez
> Developer Advocate - Wikimedia Foundation
>
> It currently doesn't really feel to me that you're advocating for the 
> developers, it feels more like you're the unlucky person having to sell the 
> bad WMF management decisions to the angry developers.
>
> Maarten
>
> ___
> Wikimedia Cloud Services mailing list
> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/cloud

___
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud


Re: [Cloud] [Cloud-announce] Cloud VPS users, please claim your projects

2020-11-16 Thread AntiCompositeNumber
Kelson already did it a few days ago.

On Mon, Nov 16, 2020 at 3:54 PM Travis Briggs  wrote:
>
> Can someone mark mwoffliner as "in use", please?
>
> I lost my phone and now I can't sign in to mediawiki tech because of 2FA.
>
> Thanks,
> -Travis
>
> On Mon, Nov 2, 2020 at 9:09 AM Maximilian Doerr  
> wrote:
>>
>> Cyberbot will never be unclaimed. :-
>>
>> Cyberpower678
>> English Wikipedia Administrator
>> English Wikipedia Interface Administrator
>> Global User Renamer
>>
>> On Nov 2, 2020, at 10:23, Andrew Bogott  wrote:
>>
>> 
>> Yes, marking it as 'in use' is how I can tell whether or not it's claimed :) 
>>  Thank you!
>>
>> -Andrew
>>
>> On 11/2/20 9:15 AM, Zoran Dori wrote:
>>
>> Hello,
>> I've already claimed the srwiki-dev project, so I'm not sure why it is 
>> listed here as unclaimed.
>>
>> But I've added "In use" and in the title, so I think that everything is okay 
>> now, right?
>>
>> Best regards,
>> Zoran
>>
>> ___
>> Wikimedia Cloud Services mailing list
>> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
>> https://lists.wikimedia.org/mailman/listinfo/cloud
>>
>>
>> ___
>> Wikimedia Cloud Services mailing list
>> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
>> https://lists.wikimedia.org/mailman/listinfo/cloud
>>
>> ___
>> Wikimedia Cloud Services mailing list
>> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
>> https://lists.wikimedia.org/mailman/listinfo/cloud
>
> ___
> Wikimedia Cloud Services mailing list
> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/cloud

___
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud


Re: [Cloud] [Cloud-announce] Wiki Replicas 2020 Redesign

2020-11-10 Thread AntiCompositeNumber
Most cross-db JOINs can be recreated using two queries and an external
tool to filter the results. However, there are some queries that would
be simply impractical due to the large amount of data involved, and
the query for overlapping local and Commons images is one of them.
There are basically two ways to recreate the query: re-implement the
inner join or re-implement a semi-join subquery.

Recreating a JOIN is conceptually very simple: get two lists and
compare them. However, there are 67,034 files on fawiki, 891,286 files
on enwiki, and 65,559,375 files on Commons. Simply joining by name
would be impossible -- MariaDB would time out a few hundred times
before returning all that data, and even if it did, storing those
lists even as efficiently as possible would be quite the memory hog.
So the query would have to be paginated. The only common identifier we
have is the file name, and since the letters in the names aren't
evenly distributed, paginating wouldn't exactly be fun.
The other option is implementing the Commons lookup like a semi-join
subquery. Iterate over the local data, paginating any way you want.
Then, for every item, query the Commons database for that title. Of
course, we're now making a million requests to the database, which
isn't going to be very fast simply due to network delays. We could be
a little nicer and group a bunch of titles together in the query,
which will probably get us down from a million queries to fifty
thousand or so. Of course, this all gets more complicated if you want
a query more complex than SELECT enwiki_p.img_title FROM
enwiki_p.image JOIN commonswiki_p.image ON enwiki_p.img_title =
commonswiki_p.img_title;

I understand the system engineering reasons for this change, but I
think it's worth underscoring exactly how disruptive it will be for
the queries that depended on this functionality. I'm certainly no
expert, but I'm willing to help wrap queries in Python until they
start working again.

ACN

On Tue, Nov 10, 2020 at 8:48 PM Huji Lee  wrote:
>
> Cross-wiki JOINS are used by some of the queries we run regularly for fawiki. 
> One of those queries looks for articles that don't have an image in their 
> infobox in fawiki, but do have one on enwiki, so that we can use/import that 
> image. Another one JOINs fawiki data with commons data to look for redundant 
> images. Yet another one, looks for articles that all use an image that 
> doesn't exist (for cleanup purposes) but needs to join with commons db 
> because the referenced file might exist there. Lastly, we have a report that 
> looks for fair use images on fawiki that had the same name as an image on 
> enwiki where the enwiki copy was deleted; this usually indicates in improper 
> application of fair use, and enwiki -- due to its larger community -- finds 
> and deletes these faster than we could on fawiki.
>
> There may be other cases I am unaware of. The point is, losing the cross-wiki 
> JOIN capability can make some of the above tasks really difficult or 
> completely impossible.
>
> On Tue, Nov 10, 2020 at 3:27 PM Joaquin Oltra Hernandez 
>  wrote:
>>
>> TLDR: Wiki Replicas' architecture is being redesigned for stability and 
>> performance. Cross database JOINs will not be available and a host 
>> connection will only allow querying its associated DB. See [1] for more 
>> details.
>>
>> Hi!
>>
>> In the interest of making and keeping Wiki Replicas a stable and performant 
>> service, a new backend architecture is needed. There is some impact in the 
>> features and usage patterns.
>>
>> What should I do? To avoid breaking changes, you can start making the 
>> following changes *now*:
>> - Update existing tools to ensure queries are executed against the proper 
>> database connection
>>   - Eg: If you want to query the `eswiki_p` DB, you must connect to the 
>> `eswiki.analytics.db.svc.eqiad.wmflabs` host and `eswiki_p` DB, and not to 
>> enwiki or other hosts
>> - Check your existing tools and services queries for cross database JOINs, 
>> rewrite the joins in application code
>>   - Eg: If you are doing a join across databases, for example joining 
>> `enwiki_p` and `eswiki_p`, you will need to query them separately, and 
>> filter the results of the separate queries in the code
>>
>> Timeline:
>> - November - December: Early adopter testing
>> - January 2021: Existing and new systems online, transition period starts
>> - February 2021: Old hardware is decommissioned
>>
>> We need your help
>> - If you would like to beta test the new architecture, please let us know 
>> and we will reach out to you soon
>> - Sharing examples / descriptions of how a tool or service was updated, 
>> writing a common solution or some example code others can utilize and 
>> reference, helping others on IRC and the mailing lists
>>
>> If you have questions or need help adapting your code or queries, please 
>> contact us [2], or write on the talk page [3].
>>
>> We will be sending reminders, and more specific examples 

Re: [Cloud] Sync 100k most popular articles to local machine?

2020-08-09 Thread AntiCompositeNumber
This is not the best list for this.
<https://www.mediawiki.org/wiki/Mailing_lists/Wikitech-l> would be
better.

If you're wanting to download articles to be able to read them, my
current recommendation is to use <https://www.kiwix.org/en/>. If
you're looking to do something else, a data dump from
<https://dumps.wikimedia.org/> may be a better idea.

If neither of those options are what you are looking for, you can find
the list of pages by number of views at
<https://pageviews.toolforge.org/topviews/?project=en.wikipedia.org=all-access=last-year=>.
You should limit your requests to a reasonable rate, typically no more
than 60-75 requests/minute (but a slower rate if possible is
appreciated). Make sure to also comply with
<https://meta.wikimedia.org/wiki/User-Agent_policy> so we can yell at
you specifically if there's a problem.

AntiCompositeNumber

On Sun, Aug 9, 2020 at 3:52 PM Thomas Güttler Lists
 wrote:
>
> Hi,
>
> I would like to sync the 100k most popular articles to my local machine.
>
> Maybe I was blind, but I could not find a suitable documentation about this.
>
> Maybe the list with the 100k most popular articles could be the start.
>
> If I have this list, I could download the articles with 100k http calls.
>
> But before I do this, I would kindly ask you about the correct way to do
> this.
>
> I don't want that you think I am doing a DoS attack :-)
>
> If there is a better place for this question, please tell me!
>
> Regards,
>
>Thomas Güttler
>
>
> --
> Thomas Guettler http://www.thomas-guettler.de/
> I am looking for feedback: https://github.com/guettli/programming-guidelines
>
>
> ___
> Wikimedia Cloud Services mailing list
> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/cloud

___
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud

Re: [Cloud] Logging query with actors and log type (user stats)

2020-04-15 Thread AntiCompositeNumber
Alright, now that I've unearthed the page I was looking for, a fuller
answer.
The wiki replicas have to do a lot of database magic to remove
private/deleted data, which means that the normal tables can be slow.

For your logging query, logging_logindex makes things faster because
log_action is nulled for deleted log entries in logging and
logging_userindex. In logging_logindex, rows with a deleted action are
removed entirely: you lose the data for deleted log entries at the benefit
of indexes on log_action. That's not a concern with this query because
you're filtering on log action anyway.

The actor view on the replicas has to do even more database magic and
perform subqueries on 8 other tables, so it's almost always going to slow
things down. If you use actor_logging instead of actor, you skip 7 of those
subqueries.

More information:
https://wikitech.wikimedia.org/wiki/Help:MySQL_queries#Alternative_Views
https://wikitech.wikimedia.org/wiki/News/Actor_storage_changes_on_the_Wiki_Replicas#The_actor_table_seems_really_slow--so_does_comment
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#Tables_for_revision_or_logging_queries_involving_user_names_and_IDs
https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/profile/templates/labs/db/views/maintain-views.yaml
(configuration for the views on the replicas)

As for why these queries were faster a few months ago, that's probably
related to database server issues (https://phabricator.wikimedia.org/T246970
and https://phabricator.wikimedia.org/T247978).

Using the optimized views brings the all-time query down to about 15
minutes (https://quarry.wmflabs.org/query/43984) and the 2019 query down to
about 106 seconds (https://quarry.wmflabs.org/query/43985).

AntiCompositeNumber

On Wed, Apr 15, 2020 at 18:13 Maciej Jaros  wrote:

> Also note that I have a similar query that kind of works but takes much
> longer to complete then it used to.
> https://quarry.wmflabs.org/query/41680
>
> Executed in 46.22 seconds as of Wed Jan 29 2020.
> Resultset (1480 rows)
>
> Executed in 188.32 seconds as of Wed Apr 15 2020.
> Resultset (1480 rows)
>
> So thats about 4 times longer then before on the same data. Well actor
> table is probably longer, but I assume this will use indexes: `LEFT JOIN
> actor a ON actor_id = log_actor`.
>
> Cheers,
> Nux.
>
> Maciej Jaros (2020-04-15 23:56):
>
> Not sure if this is an appropriate place to discuss this but I'm not able
> to run this query:
> https://quarry.wmflabs.org/query/24267
>
> Even a minor part of this query like below needs a significant time to
> complete.
>SELECT count(*) as review_count, log_actor
> FROM logging
> WHERE log_type = 'review' AND log_action = 'approve'
> GROUP BY log_actor
>
> I tried with logging_userindex but that don't seem to help. Any chances on
> adding extra indexes to make such queries work? I think querying log_type
> and log_actor should be quite common for various user stats.
>
> Even this takes almost 200 seconds to complete:
> SELECT count(*) as review_count, log_actor
> FROM logging_userindex
> WHERE log_type = 'review' AND log_action = 'approve'
> AND log_timestamp >= 2019010100
> AND log_timestamp <= 20190101235959
> GROUP BY log_actor
>
> Also note that the query 24267 used to work in January 2020. I searched
> through the list, but haven't found any changes related to the logging
> table since January.
>
> Cheers,
> Nux.
>
> ___
> Wikimedia Cloud Services mailing listcl...@lists.wikimedia.org (formerly 
> lab...@lists.wikimedia.org)https://lists.wikimedia.org/mailman/listinfo/cloud
>
>
> ___
> Wikimedia Cloud Services mailing list
> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/cloud
___
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud

Re: [Cloud] Logging query with actors and log type (user stats)

2020-04-15 Thread AntiCompositeNumber
You should use the un(der)-documented `logging_logindex` view instead of
`logging`.

On Wed, Apr 15, 2020 at 17:56 Maciej Jaros  wrote:

> Not sure if this is an appropriate place to discuss this but I'm not able
> to run this query:
> https://quarry.wmflabs.org/query/24267
>
> Even a minor part of this query like below needs a significant time to
> complete.
>SELECT count(*) as review_count, log_actor
> FROM logging
> WHERE log_type = 'review' AND log_action = 'approve'
> GROUP BY log_actor
>
> I tried with logging_userindex but that don't seem to help. Any chances on
> adding extra indexes to make such queries work? I think querying log_type
> and log_actor should be quite common for various user stats.
>
> Even this takes almost 200 seconds to complete:
> SELECT count(*) as review_count, log_actor
> FROM logging_userindex
> WHERE log_type = 'review' AND log_action = 'approve'
> AND log_timestamp >= 2019010100
> AND log_timestamp <= 20190101235959
> GROUP BY log_actor
>
> Also note that the query 24267 used to work in January 2020. I searched
> through the list, but haven't found any changes related to the logging
> table since January.
>
> Cheers,
> Nux.
> ___
> Wikimedia Cloud Services mailing list
> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/cloud
___
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud

Re: [Cloud] [Cloud-announce] Toolforge: new domain toolforge.org

2020-04-13 Thread AntiCompositeNumber
Continued support for the toolforge: interwiki prefix is being tracked at
https://phabricator.wikimedia.org/T247432.

On Mon, Apr 13, 2020 at 09:04 Antonin Delpeuch (lists) <
li...@antonin.delpeuch.eu> wrote:

> Hi Arturo,
>
> This is great news! The tool urls will be definitely prettier this way.
>
> Currently, it is possible to link to a Toolforge-hosted service from a
> WMF wiki with the following wikicode:
>
> [[:toollabs:editgroups/b/OR/f813407aad/|my link]]
>
> which links to
>
> https://tools.wmflabs.org/editgroups/b/OR/f813407aad/
>
> Do you have any plans to migrate this to any other syntax? I assume the
> new scheme would make it hard, since the tool name and the URL path are
> separated.
>
> Serving permanent redirects to the new URL scheme is a good idea. It is
> very important for this editgroups tool as there are millions of links
> to it in edit summaries (which are not editable).
>
> Cheers,
> Antonin
>
>
> On 13/04/2020 13:25, Arturo Borrero Gonzalez wrote:
> > Hi!
> >
> > We are happy to announce the new domain 'toolforge.org' is now ready to
> be
> > adopted by our Toolforge community.
> >
> > There is a lot of information related to this change in a wikitech page
> we have
> > for this:
> >
> > https://wikitech.wikimedia.org/wiki/News/Toolforge.org
> >
> > The most important change you will see happening is a new domain/scheme
> for
> > Toolforge-hosted webservices:
> >
> > * from https://tools.wmflabs.org//
> > * to   https://.toolforge.org/
> >
> > A live example of this change can be found in our internal
> openstack-browser
> > webservice tool:
> >
> > * legacy URL: https://tools.wmflabs.org/openstack-browser/
> > * new URL:https://openstack-browser.toolforge.org
> >
> > This domain change is something we have been working on for months
> previous to
> > this announcement. Part of our work has been to ensure we have a smooth
> > transition from the old domain (and URL scheme) to the new canonical one.
> > However, we acknowledge the ride might be bumpy for some folks, due to
> technical
> > challenges or cases we didn't consider when planning this migration.
> Please
> > reach out intermediately if you find any limitation or failure anywhere
> related
> > to this change. The wikitech page also contains a section with
> information for
> > common problems.
> >
> > You can check now if your webservice needs any specific change by
> creating a
> > temporal redirection to the new canonical URL:
> >
> > $ webservice --canonical --backend=kubernetes start [..]
> > $ webservice --canonical --backend=gridengine start [..]
> >
> > The --canonical switch will create a temporal redirect that you can turn
> on/off.
> > Please use this to check how your webservice behaves with the new
> domain/URL
> > scheme. If you start the webservice without --canonical, the temporal
> redirect
> > will be removed.
> >
> > We aim to introduce permanent redirects for the legacy URLs on
> 2020-06-15. We
> > expect to keep serving legacy URLs forever, by means of redirections to
> the new
> > URLs. More information on the redirections can also be found in the
> wikitech page.
> >
> > The toolforge.org domain is finally here! <3
> >
>
>
> ___
> Wikimedia Cloud Services mailing list
> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/cloud
___
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud

Re: [Cloud] Can this query be optimized further?

2020-04-12 Thread AntiCompositeNumber
Sometimes a subquery will be faster that the LEFT JOIN:

...
WHERE page_id NOT IN (SELECT c2.cl_from FROM categorylinks c2 WHERE
c2.cl_to = 'صفحه‌های_گسترده_در_دست_ساخت'
...

If that doesn't help, you'll probably have to paginate the query (split it
into smaller chunks, then recombine them).

On Sun, Apr 12, 2020 at 08:52 Huji Lee  wrote:

> I have a query that tries to find pages in article namespace that contain
> a link to some page in the user/user talk namespace. Many of these result
> from users signing in the articles (which is obviously not appropriate).
> The query also tries to exclude pages that have a legitimate link to user
> pages; e.g. our {{under construction}} template contains a link to the user
> page of the person who affixed the template, and we can easily find these
> pages based on a category they get added to by the same template.
>
> The query is very brief, and I have pasted it below. To the best of my
> understand, it uses indexes and does not cause any USING WHERE or USING
> FILESORT steps. Nevertheless, the query takes more than 30 minutes on Labs
> and is killed as a result.
>
> Is there a way to optimize this query further? If not, that is okay; I
> will implement a two-step query. But I just want to make sure I'm not
> missing something obvious.
>
> Thanks,
> Huji
>
>
>
>
> SELECT
>   page_title,
>   pl_title,
>   CASE
> WHEN c2.cl_to IS NULL THEN NULL
> ELSE '{{yes}}'
>   END AS under_construction
> FROM page
> JOIN pagelinks
>   ON page_id = pl_from
> LEFT JOIN categorylinks c2
>   ON page_id = c2.cl_from
>   AND c2.cl_to = 'صفحه‌های_گسترده_در_دست_ساخت'
> WHERE
>   page_namespace = 0
>   AND pl_namespace IN (2, 3)
> ___
> Wikimedia Cloud Services mailing list
> Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/cloud
___
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud