[Wikidata-bugs] [Maniphest] T347493: Serve Wikidata traffic via Kubernetes

2023-09-28 Thread Joe
Joe added a comment. Well turns out the issue was simpler: we even had a TODO in the code: # TODO: add mw-on-k8s once we think of moving wikidata or partial traffic. Sigh. Thanks @Lucas_Werkmeister_WMDE for noticing, this will be fixed as soon as I get a review. TASK DETAIL

[Wikidata-bugs] [Maniphest] T347493: Serve Wikidata traffic via Kubernetes

2023-09-28 Thread Joe
Joe added a comment. I tried restarting ATS on a backend, cp1081, then made requests for wikidata's special:random to trafficserver directly: still all going to appservers on bare metal. So the problem isn't in `mw-on-k8s.lua`, apparently... TASK DETAIL https

[Wikidata-bugs] [Maniphest] T347493: Serve Wikidata traffic via Kubernetes

2023-09-28 Thread Joe
Joe added a comment. Interestingly, I do get correct results for m.wikidata.org, but somehow not for www.wikidata.org (also, please grep for `mw-web` as we've repooled eqiad in the meantime). This makes the whole thing even more puzzling tbh. TASK DETAIL https

[Wikidata-bugs] [Maniphest] T347493: Serve Wikidata traffic via Kubernetes

2023-09-28 Thread Joe
Joe added a comment. @Jdforrester-WMF no, this task is actually about that patch not having the effect we expected. TASK DETAIL https://phabricator.wikimedia.org/T347493 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Joe Cc: Joe, Jdforrester

[Wikidata-bugs] [Maniphest] T344904: Termbox SSR broken since k8s migration

2023-08-24 Thread Joe
Joe added a comment. Yes, localhost:6008 is pointing to `termbox.discovery.wmnet:4004` in production. The problem doesn't seem to be in termbox, as we could both fetch the data from the service without issues. So the issue doesn't seem to be related to the switch to use mediawiki

[Wikidata-bugs] [Maniphest] T334064: Migrate termbox to mw-api-int

2023-08-24 Thread Joe
Joe closed this task as "Resolved". Joe added a comment. Termbox has been migrated TASK DETAIL https://phabricator.wikimedia.org/T334064 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Joe Cc: WMDE-leszek, Lucas_Werkmeister_WMDE, Joe,

[Wikidata-bugs] [Maniphest] T334064: Migrate termbox to mw-api-int

2023-08-23 Thread Joe
Joe added a comment. Just deployed the change to termbox-test, and I still see my test url `http://termbox-test.staging.svc.eqiad.wmnet:3031/termbox?entity=Q229877=630197=en=%2Fwiki%2FSpecial%3ASetLabelDescriptionAliases%2FQ229877=en` return the same content after the redeployment

[Wikidata-bugs] [Maniphest] T214402: populateCognatePages.php query keeps timing out while waiting for replication

2023-03-21 Thread Joe
Joe changed the status of subtask T172497: Fix mediawiki heartbeat model, change pt-heartbeat model to not use super-user, avoid SPOF and switch automatically to the real master without puppet dependency from Open to Stalled. TASK DETAIL https://phabricator.wikimedia.org/T214402 EMAIL

[Wikidata-bugs] [Maniphest] T331405: Query service maxlag calculation should exclude datacenters that don't receive traffic and where the updater is turned off

2023-03-09 Thread Joe
Joe added a comment. Re-thinking about this: what we're really interested in is knowing what is the max lag of a server that is receiving user traffic. So I crafted the following metric in prometheus: `max(time() - blazegraph_lastupdated and rate(blazegraph_queries_done_total{}[5m

[Wikidata-bugs] [Maniphest] T331405: Query service maxlag calculation should exclude datacenters that don't receive traffic and where the updater is turned off

2023-03-09 Thread Joe
Joe renamed this task from "Depooled servers may still be taken into account for query service maxlag" to "Query service maxlag calculation should exclude datacenters that don't receive traffic and where the updater is turned off". Joe updated the task description.

[Wikidata-bugs] [Maniphest] T331405: Depooled servers may still be taken into account for query service maxlag

2023-03-09 Thread Joe
Joe edited projects, added serviceops; removed Sustainability (Incident Followup), SRE. Joe triaged this task as "Medium" priority. TASK DETAIL https://phabricator.wikimedia.org/T331405 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: J

[Wikidata-bugs] [Maniphest] T331405: Depooled servers may still be taken into account for query service maxlag

2023-03-09 Thread Joe
Joe added a comment. In T331405#8672360 <https://phabricator.wikimedia.org/T331405#8672360>, @dcausse wrote: > In T331405#8672341 <https://phabricator.wikimedia.org/T331405#8672341>, @Joe wrote: > >> Updates shouldn't depend on where the discovery dns recor

[Wikidata-bugs] [Maniphest] T331405: Depooled servers may still be taken into account for query service maxlag

2023-03-07 Thread Joe
Joe added a comment. To ensure I understood your problem correctly: why were those servers not getting updated anymore? Updates shouldn't depend on where the discovery dns record points to, but rather go to the local datacenter directly. I think the bug here is with wdqs-updater

[Wikidata-bugs] [Maniphest] T305785: Wikibase Snak hashes (and thus "mainsnak", "references" and "qualifiers" hashes) depend on legacy PHP serialization

2022-10-06 Thread Joe
Joe closed subtask T318918: Undeploy patch to use old PHP serialization in PHP 7.4 as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T305785 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo, Joe Cc: Michael, ItamarWMDE, Reedy, Aklapper

[Wikidata-bugs] [Maniphest] T316923: Restore skipped test in ReferenceListTest.php

2022-10-06 Thread Joe
Joe closed subtask T318918: Undeploy patch to use old PHP serialization in PHP 7.4 as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T316923 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Joe Cc: Lucas_Werkmeister_WMDE, Ollie.Shotton_WMDE

[Wikidata-bugs] [Maniphest] T305785: Wikibase Snak hashes (and thus "mainsnak", "references" and "qualifiers" hashes) depend on legacy PHP serialization

2022-10-06 Thread Joe
Joe closed subtask T318918: Undeploy patch to use old PHP serialization in PHP 7.4 as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T305785 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo, Joe Cc: Michael, ItamarWMDE, Reedy, Aklapper

[Wikidata-bugs] [Maniphest] T316923: Restore skipped test in ReferenceListTest.php

2022-10-06 Thread Joe
Joe closed subtask T318918: Undeploy patch to use old PHP serialization in PHP 7.4 as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T316923 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Joe Cc: Ollie.Shotton_WMDE, WMDE-leszek, Jakob_WMDE

[Wikidata-bugs] [Maniphest] T238751: Only generate maxlag from pooled query service servers.

2022-08-11 Thread Joe
Joe changed the task status from "Open" to "Stalled". Joe removed Joe as the assignee of this task. Joe added a comment. Hi, any news on this front? I'll release this bug as its completion doesn't depend on me right now. When the functionality has been merged, please re

[Wikidata-bugs] [Maniphest] T270614: Automatically depool wdqs servers that are "lagged"

2022-08-11 Thread Joe
Joe changed the status of subtask T238751: Only generate maxlag from pooled query service servers. from Open to Stalled. TASK DETAIL https://phabricator.wikimedia.org/T270614 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Joe Cc: karapayneWMDE

[Wikidata-bugs] [Maniphest] T238751: Only generate maxlag from pooled query service servers.

2022-06-01 Thread Joe
Joe added a comment. Sadly I had to revert, because the `--lb` and the `--lb-pool` commands are not recognized by the script. mwmaint1002:~$ /usr/local/bin/mwscript extensions/Wikidata.org/maintenance/updateQueryServiceLag.php --wiki wikidatawiki --cluster wdqs --prometheus

[Wikidata-bugs] [Maniphest] T238751: Only generate maxlag from pooled query service servers.

2022-05-23 Thread Joe
Joe added a comment. Given the changes we've made to puppet in the meantime, I am now able to feed the right parameters to the script if we want to. The following patch https://gerrit.wikimedia.org/r/c/operations/puppet/+/797077 will cause these corresponding changes in puppet

[Wikidata-bugs] [Maniphest] T301471: New Service Request SchemaTree

2022-02-21 Thread Joe
Joe added a comment. In T301471#7726097 <https://phabricator.wikimedia.org/T301471#7726097>, @Michaelcochez wrote: > @Joe for the base image, would you recommend our current approach of starting from an 'empty' image and downloading the latest go distribution ourselves, or

[Wikidata-bugs] [Maniphest] T301471: New Service Request SchemaTree

2022-02-21 Thread Joe
Joe added a comment. Ok so a few requirements: 1. we need the repository to be on gerrit, and to include a `.pipeline` directory to be built using blubber/the deployment pipeline. 2. you should probably base your image on debian bullseye and not debian stretch, but that can be done

[Wikidata-bugs] [Maniphest] T301471: New Service Request SchemaTree

2022-02-16 Thread Joe
Joe added a comment. Hi, if this service is to be used in the WMF production environment (and given the call graph, it will), it needs to run on kubernetes, and thus we will need to be built using our deployment pipeline first, and use the deployment-charts repository to define

[Wikidata-bugs] [Maniphest] T285098: Production A/B test deployment - Improved Property Suggester/Recommender

2021-08-05 Thread Joe
Joe added a comment. Given my opposition to the plan as proposed in this task, I've been asked to explain it in more detail here. First of all, I want to say that IMHO things would have gone smoother if you asked SRE for an opinion about the plan before it was put in motion. Keep

[Wikidata-bugs] [Maniphest] T285104: Deploy Shellbox instance (shellbox-constraints) for Wikidata constraint regexes

2021-07-27 Thread Joe
Joe added a comment. I think there are two options, depending on the level of security we want to achieve and the urgency of bringing this to production: 1. We just point to the current installation and it should just work(TM). But we'd need to perform a small migration afterwards. 2

[Wikidata-bugs] [Maniphest] T286935: Find a way to make swift Tempauth usable behind envoy

2021-07-20 Thread Joe
Joe added a comment. I would say this needs a more thorough change of how we use envoy - I'm specifically thinking of doing something more transparent like istio does. But given I don't think we'll get around doing that soon enough for your timeline, I'd advise to skip going through envoy

[Wikidata-bugs] [Maniphest] T285634: June 2021: appservers accumulating active php-fpm workers, requiring rolling restarts to avoid user-visible latency impact

2021-06-29 Thread Joe
Joe closed this task as "Resolved". Joe added a comment. Data on the number of apcu gets/s normalized after the release: https://grafana-rw.wikimedia.org/goto/4-r5Lhk7z I'm going to optimistically resolve this bug. TASK DETAIL https://phabricator.wikimedia.org/T285

[Wikidata-bugs] [Maniphest] T285634: June 2021: appservers accumulating active php-fpm workers, requiring rolling restarts to avoid user-visible latency impact

2021-06-29 Thread Joe
Joe added a comment. In T285634#7183188 <https://phabricator.wikimedia.org/T285634#7183188>, @daniel wrote: >> Scavenging the production logs, we found that Special:EntityData requests for rdf documents were possibly the culprit. > > Did the code change, or

[Wikidata-bugs] [Maniphest] T285634: June 2021: appservers accumulating active php-fpm workers, requiring rolling restarts to avoid user-visible latency impact

2021-06-29 Thread Joe
Joe added a comment. Scavenging the production logs, we found that `Special:EntityData` requests for rdf documents were possibly the culprit. This is the result of profiling http://www.wikidata.org/wiki/Special:EntityData/Q146190.rdf : https://performance.wikimedia.org/xhgui

[Wikidata-bugs] [Maniphest] T285634: June 2021: appservers accumulating active php-fpm workers, requiring rolling restarts to avoid user-visible latency impact

2021-06-29 Thread Joe
Joe added a comment. I think what @Addshore just found is a good candidate for being the source of the issue. I'll try and get some more info from apcu on servers, although they've all been recently restarted to ease the pressure. I will take several snapshots of the apcu metadata

[Wikidata-bugs] [Maniphest] T281480: Cannot access the database: Too many connections

2021-04-29 Thread Joe
Joe added a comment. There is definitely something going very wrong with memcached: https://grafana.wikimedia.org/d/00316/memcache?viewPanel=60=1=now-30d=now shows misses increasing across the board TASK DETAIL https://phabricator.wikimedia.org/T281480 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T281480: Cannot access the database: Too many connections

2021-04-29 Thread Joe
Joe added a subscriber: Pchelolo. Joe added a comment. In T281480#7046160 <https://phabricator.wikimedia.org/T281480#7046160>, @Joe wrote: > Given we only make requests to external storage when parsercache has a miss, it seemed sensible to look for corresponding patterns in pa

[Wikidata-bugs] [Maniphest] T281480: Cannot access the database: Too many connections

2021-04-29 Thread Joe
Joe added a comment. Given we only make requests to external storage when parsercache has a miss, it seemed sensible to look for corresponding patterns in parsercache. I see we introduced a new category of misses on the same date "miss_absent_metadata", see https

[Wikidata-bugs] [Maniphest] T264821: Hourly read spikes against s8 resulting in occasional user-visible latency & error spikes

2020-10-07 Thread Joe
Joe added a comment. To clarify a bit - restbase has hourly spikes of requests for the `feed` endpoint, which go back to wikifeeds, which calls both restbase and the action api. From the graphs of calls from wikifeeds it's clear we have hourly peaks happening at :00 in the number

[Wikidata-bugs] [Maniphest] T260281: mw* servers memory leaks (12 Aug)

2020-08-20 Thread Joe
Joe closed this task as "Resolved". Joe added a comment. Reporting here in brief: - We confirmed the problem had to do with activating firejail for all executions of external programs. That triggered a kernel bug - This kernel bug can be bypassed by disabling kernel memory

[Wikidata-bugs] [Maniphest] T260281: mw* servers memory leaks (12 Aug)

2020-08-19 Thread Joe
Joe claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T260281 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Joe Cc: eprodromou, Michael, NullPointer, Platonides, hashar, Addshore, Majavah, Ladsgroup, JMeybohm, ema, Joe, RhinosF1

[Wikidata-bugs] [Maniphest] T260281: mw* servers memory leaks (12 Aug)

2020-08-18 Thread Joe
Joe closed subtask Restricted Task as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T260281 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Joe Cc: eprodromou, Michael, NullPointer, Platonides, hashar, Addshore, Majavah, Ladsgroup

[Wikidata-bugs] [Maniphest] T260329: Figure what change caused the ongoing memleak on mw appservers

2020-08-16 Thread Joe
Joe added a comment. To test the hypothesis that this is related to firejail use, we're sending 1 req/s to one appserver to use pygments, to see if that has any particularly ill effect. TASK DETAIL https://phabricator.wikimedia.org/T260329 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T260329: Figure what change caused the ongoing memleak on mw appservers

2020-08-13 Thread Joe
Joe added a comment. In T260329#6382296 <https://phabricator.wikimedia.org/T260329#6382296>, @Ladsgroup wrote: > For the wikibase part, I highly doubt it, the php entry point calls `wfLoadExtension` internally. Hi strongly doubt it has any effect as well, but I'd prefer to

[Wikidata-bugs] [Maniphest] T260329: Figure what change caused the ongoing memleak on mw appservers

2020-08-13 Thread Joe
Joe updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T260329 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Joe Cc: CDanis, Aklapper, jijiki, ArielGlenn, RhinosF1, Joe, lmata, wkandek, JMeybohm, Akuckartz, darthmon_wmde

[Wikidata-bugs] [Maniphest] T260329: Figure what change caused the ongoing memleak on mw appservers

2020-08-13 Thread Joe
Joe added a comment. The list of software updated that day on the appservers is at P12221 <https://phabricator.wikimedia.org/P12221> TASK DETAIL https://phabricator.wikimedia.org/T260329 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To:

[Wikidata-bugs] [Maniphest] T260329: Figure what change caused the ongoing memleak on mw appservers

2020-08-13 Thread Joe
Joe created this task. Joe added projects: serviceops, Operations, Sustainability (Incident Followup), Platform Engineering, Wikidata. TASK DESCRIPTION Something induced a progressive memory leak on all servers serving MediaWiki starting on august 4th between 8 and 13 UTC. The problem

[Wikidata-bugs] [Maniphest] T260281: mw* servers memory leaks (12 Aug)

2020-08-12 Thread Joe
Joe triaged this task as "Unbreak Now!" priority. Joe added projects: Platform Engineering, Wikidata. Joe added a comment. I'm not 100% sure that slabs are the problem here, but I'll try to followup later. In the meantime, the servers we've rebooted yesterday are definite

[Wikidata-bugs] [Maniphest] T258739: wdqs admins should have access to nginx logs, jstack on wdqs machines

2020-07-27 Thread Joe
Joe triaged this task as "High" priority. TASK DETAIL https://phabricator.wikimedia.org/T258739 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Joe Cc: herron, RKemper, CDanis, dcausse, Aklapper, Dzahn, lmata, Alter-paule, Beast1978, CBo

[Wikidata-bugs] [Maniphest] T258739: wdqs admins should have access to nginx logs, jstack on wdqs machines

2020-07-24 Thread Joe
Joe claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T258739 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Joe Cc: RKemper, CDanis, dcausse, Aklapper, Dzahn, lmata, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, Hook696

[Wikidata-bugs] [Maniphest] [Commented On] T252091: RFC: Site-wide edit rate limiting with PoolCounter

2020-05-20 Thread Joe
Joe added a comment. So, while I find the idea of using poolcounter to limit the editing **concurrency** (it's not rate-limiting, which is different) a good proposal, and in general something desirable to have (including the possibility we tune it down to zero if we're in a crisis

[Wikidata-bugs] [Maniphest] [Commented On] T252091: RFC: Site-wide edit rate limiting with PoolCounter

2020-05-20 Thread Joe
Joe added a comment. > - "The above suggests that the current rate limit is too high," this is not correct, the problem is that there is no rate limit for bots at all. The group explicitly doesn't have a rate limit. Adding such ratelimit was tried and caused lots of

[Wikidata-bugs] [Maniphest] [Updated] T249595: Purge / Reject client pages that were cached in parser cache during the T249565 (wb_items_per_site) incident

2020-04-07 Thread Joe
Joe added a comment. I think the right way to do this would be to emit an htmlCacheUpdate job for every wikidata edit in the interval. These will: - recursively find all linked pages (not sure this works for wikibase items though - you might know that better) - invalidate

[Wikidata-bugs] [Maniphest] [Commented On] T247058: Deployment strategy and hardware requirement for new Flink based WDQS updater

2020-03-06 Thread Joe
Joe added a comment. I would like to read an assessment of why our current event processing platform, change-propagation, is not suited for this purpose, and we need to introduce a new software. I suppose this has been done at some point in another task; if so a quick link would suffice

[Wikidata-bugs] [Maniphest] [Commented On] T240884: Standalone service to evaluate user-provided regular expressions

2020-01-22 Thread Joe
Joe added a comment. In T240884#5813174 <https://phabricator.wikimedia.org/T240884#5813174>, @Daimona wrote: > In T240884#5810160 <https://phabricator.wikimedia.org/T240884#5810160>, @sbassett wrote: > >> In T240884#5810094 <https://phabricator.wi

[Wikidata-bugs] [Maniphest] [Commented On] T240884: Standalone service to evaluate user-provided regular expressions

2020-01-13 Thread Joe
Joe added a comment. I think the main question to answer is "does it make sense to create a safe regex evaluation service?". I think in a void the answer is "no". It could make sense to create a small C++ program wrapping the main re2 functionality and shel

[Wikidata-bugs] [Maniphest] [Commented On] T240884: Standalone service to evaluate user-provided regular expressions

2020-01-13 Thread Joe
Joe added a comment. In T240884#5789392 <https://phabricator.wikimedia.org/T240884#5789392>, @Ladsgroup wrote: >> Though this is mainly an implementation detail and not significant in terms requirements or pros/cons. > > I disagree for a couple of reasons: gRPC is

[Wikidata-bugs] [Maniphest] [Commented On] T237319: 502 errors on ATS/8.0.5

2019-11-21 Thread Joe
Joe added a comment. In T237319#5681384 <https://phabricator.wikimedia.org/T237319#5681384>, @darthmon_wmde wrote: > Is there anything that we can quickly do on wikibase to fix this? > if so, please advise what concretely. > Thanks! In general, whenever yo

[Wikidata-bugs] [Maniphest] [Commented On] T237319: Bug: 502 error when marking page for translation

2019-11-20 Thread Joe
Joe added a comment. In T237319#5677665 <https://phabricator.wikimedia.org/T237319#5677665>, @Vgutierrez wrote: > I find this pretty worrisome for the following reasons: > > 1. right now we have one remap rule that catches all the requests handled by appservers

[Wikidata-bugs] [Maniphest] [Closed] T236709: Error when executing helmfile commands for the termbox service

2019-10-29 Thread Joe
Joe closed this task as "Resolved". Joe added a comment. I have just tested and I can easily run `helmfile diff` on termbox now, in all environments. Resolving for now TASK DETAIL https://phabricator.wikimedia.org/T236709 EMAIL PREFERENCES https://phabricator.wikimedia.or

[Wikidata-bugs] [Maniphest] [Commented On] T236709: Error when executing helmfile commands for the termbox service

2019-10-29 Thread Joe
Joe added a comment. @Tarrow @Pablo-WMDE can someone try the release to staging? I should have fixed the rbac roles there. It should've fixed your issues. I am proceeding with releasing the change on the main clusters too, in the meanwhile. TASK DETAIL https

[Wikidata-bugs] [Maniphest] [Commented On] T236709: Error when executing helmfile commands for the termbox service

2019-10-29 Thread Joe
Joe added a comment. @Tarrow if it's an urgent bugfix we can just revert the change to let you deploy immediately. Please let's coordinate on IRC, and sorry for the inconvenience :) TASK DETAIL https://phabricator.wikimedia.org/T236709 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Claimed] T236709: Error when executing helmfile commands for the termbox service

2019-10-28 Thread Joe
Joe claimed this task. Joe triaged this task as "High" priority. Joe added a comment. @Jakob_WMDE this is a result of our temporary fix for a CVE affecting kubernetes. We will try to revert the situation tomorrow. Thanks for your patience. TASK DETAIL https://phabricator.wik

[Wikidata-bugs] [Maniphest] [Commented On] T231089: WikibaseClient.php: PHP Notice: Undefined index:

2019-09-12 Thread Joe
Joe added a comment. In T231089#5470160 <https://phabricator.wikimedia.org/T231089#5470160>, @Krinkle wrote: > Smells like T229433 <https://phabricator.wikimedia.org/T229433>. Which is also about `''` array index, and PHP 7.2. It's obviously a bug in PHP 7.2, but I'v

[Wikidata-bugs] [Maniphest] [Commented On] T231089: WikibaseClient.php: PHP Notice: Undefined index:

2019-09-12 Thread Joe
Joe added a comment. In T231089#5445485 <https://phabricator.wikimedia.org/T231089#5445485>, @Ladsgroup wrote: > Probably we can just merge this to T224491: PHP 7 corruption during deployment (was: PHP 7 fatals on mw1262) <https://phabricator.wikimedia.org/T224491>

[Wikidata-bugs] [Maniphest] [Commented On] T232035: 1.34.0-wmf.21 cause termbox to emit: Test get rendered termbox returned the unexpected status 500

2019-09-05 Thread Joe
Joe added a comment. So the real issue was: - termbox **correctly** uses the `api-ro.discovery.wmnet` host - the discovery record was **incorrectly** set to active-active - so requests from termbox would just go to the nearest dc, meaning that in codfw it would face super-cold caches

[Wikidata-bugs] [Maniphest] [Commented On] T232035: 1.34.0-wmf.21 cause termbox to emit: Test get rendered termbox returned the unexpected status 500

2019-09-05 Thread Joe
Joe added a comment. In T232035#5467309 <https://phabricator.wikimedia.org/T232035#5467309>, @Tarrow wrote: > I think this is probably the same as T229313 <https://phabricator.wikimedia.org/T229313>. We suspected it might be related to T231011 <https://phabricator.wik

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T228343: Huge delay for RecentChanges in test.wikidata

2019-07-22 Thread Joe
Joe added subscribers: Pchelolo, Joe. Joe added a comment. One comment I can add is that if recentchanges jobs are being slow/delayed because of queueing, you should see the effect on all wikis and not just one, given how the jobqueue is configured. Pinging @Pchelolo about the status

[Wikidata-bugs] [Maniphest] [Commented On] T223310: Investigate increase in tx bandwidth usage for mc1033

2019-05-21 Thread Joe
Joe added a comment. So, after turning off php7 this morning we saw no modification in the rate of requests to mc1033. It seems extremely probable that the switch of larger wikis to .wmf3 is what caused this regression. TASK DETAIL https://phabricator.wikimedia.org/T223310 EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T223310: Investigate increase in tx bandwidth usage for mc1033

2019-05-21 Thread Joe
Joe added a comment. @kostajh for now I'm switching off php7 for other investigations, so we will know immediately if the additional traffic is due to that or not. TASK DETAIL https://phabricator.wikimedia.org/T223310 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] [Updated] T223310: Investigate increase in tx bandwidth usage for mc1033

2019-05-20 Thread Joe
Joe added a comment. I think I know what happened here - and it's possibly in relation with T223180 <https://phabricator.wikimedia.org/T223180> . PHP7's APC memory was perfectly ok when I looked into it (and we just had the beta feature enabled), but it's not sufficient by far whe

[Wikidata-bugs] [Maniphest] [Closed] T215339: No jobs running on beta cluster

2019-04-19 Thread Joe
Joe closed this task as "Resolved". Joe added a comment. I fixed the configuration of cpjobqueue in deployment-prep, restarted the service, and verified requests are not getting through to the jobrunner: 2019-04-19T10:59:07 10234170172.16.4.124 proxy:fcgi://127.

[Wikidata-bugs] [Maniphest] [Claimed] T215339: No jobs running on beta cluster

2019-04-19 Thread Joe
Joe claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T215339 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Joe Cc: kostajh, jijiki, Ramsey-WMF, Krenair, Cparle, Joe, Stashbot, Aklapper, MarcoAurelio, Samwilson, Pchelolo

[Wikidata-bugs] [Maniphest] [Commented On] T215339: No jobs running on beta cluster

2019-04-19 Thread Joe
Joe added a comment. FWIW, I don't think we need the TLS configuration in beta. I can try to simplify things. Sorry for not noticing this bug earlier, but adding #operations <https://phabricator.wikimedia.org/tag/operations/> or better #serviceops <https://phabricator.wikimedi

[Wikidata-bugs] [Maniphest] [Commented On] T212189: New Service Request: Wikidata Termbox SSR

2019-03-25 Thread Joe
Joe added a comment. In T212189#5053087 <https://phabricator.wikimedia.org/T212189#5053087>, @RazShuty wrote: > Hey @akosiaris, not sure I see it in there, maybe I'm lost a bit... can you point me out to where the SSR is in https://www.mediawiki.org/w/index.

[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2019-02-20 Thread Joe
Joe added a comment. In T214362#4967944 <https://phabricator.wikimedia.org/T214362#4967944>, @Addshore wrote: > > We currently still want to be able to compute the check on demand, either because the user wants to purge the current constraint check data, or i

[Wikidata-bugs] [Maniphest] [Commented On] T213318: Wikibase Front-End Architecture

2019-02-14 Thread Joe
Joe added a comment. In T213318#4954262, @dbarratt wrote: In T213318#4953461, @Smalyshev wrote: I frankly have a bit of a hard time imagining an IT person of the kind that commonly installs smaller wikis being able to efficiently maintain a zoo of services that we're now running in WMF. I think

[Wikidata-bugs] [Maniphest] [Commented On] T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage

2019-02-14 Thread Joe
Joe added a comment. In order to better understand your needs, let me ask you a few questions: Do we need/want just the constraint check for the latest version of the item, or one for each revision? How will we access such constraints? Always by key and/or full dump, or other access patterns can

[Wikidata-bugs] [Maniphest] [Commented On] T213318: Wikibase Front-End Architecture

2019-01-20 Thread Joe
Joe added a comment. In T213318#4888367, @Nikerabbit wrote: Do I understand this correctly, that this would add a mandatory Nodejs service to run a Wikibase installation? Is there no client side rendering support planned initially? As an sysadmin for couple of third party wikis (some of which use

[Wikidata-bugs] [Maniphest] [Commented On] T213318: Wikibase Front-End Architecture

2019-01-16 Thread Joe
Joe added a comment. In T213318#4885332, @daniel wrote: So, in conclusion, Wikidata has a lot of edits, but several magnitudes fewer views than a Wikipedia of comparable size. So, while MediaWiki generally optimizes for heavy read loads, the Wikidata UI should be optimized for frequent edits

[Wikidata-bugs] [Maniphest] [Commented On] T213318: Wikibase Front-End Architecture

2019-01-15 Thread Joe
Joe added a comment. Moving (even part of) the presentation layer outside of MediaWiki raises quite a few questions we have to make important decisions about. But in the case of Wikidata, I can see how an exception could be made: It's not part of the core functionality of MediaWiki Its UI

[Wikidata-bugs] [Maniphest] [Commented On] T212189: New Service Request: Wikidata Termbox SSR

2018-12-21 Thread Joe
Joe added a comment. In T212189#4840039, @WMDE-leszek wrote: To avoid misunderstandings: I was not questioning MediaWiki's action API being performant. By "lightweight" I was referring to "PHP has high startup time" point @daniel made above as one of the reason why no servic

[Wikidata-bugs] [Maniphest] [Commented On] T212189: New Service Request: Wikidata Termbox SSR

2018-12-21 Thread Joe
Joe added a comment. In T212189#4839848, @WMDE-leszek wrote: The intention of introducing the service is not to have a service that call Mediawiki. As discussed above, it is needed for the service to ask for some data, and this data shall be provided by some API. Currently, the only API

[Wikidata-bugs] [Maniphest] [Commented On] T212189: New Service Request: Wikidata Termbox SSR

2018-12-21 Thread Joe
Joe added a comment. Also, if we're going to build microservices, I'd like to not see applications that "grow", at least in terms of what they can do. A microservice should do one thing and do it well. In this case, it's using data from mediawiki to render an HTML fragment; unless you wa

[Wikidata-bugs] [Maniphest] [Commented On] T212189: New Service Request: Wikidata Termbox SSR

2018-12-21 Thread Joe
Joe added a comment. In T212189#4838090, @Addshore wrote: The "termbox" is more of an application than a template. Only it knows which data it needs - actively "sending" data to it requires knowledge of which information is needed. While seemingly triv

[Wikidata-bugs] [Maniphest] [Commented On] T212189: New Service Request: Wikidata Termbox SSR

2018-12-21 Thread Joe
Joe added a comment. Let me state it again: the SSR service should not need to call the mediawiki api. It should accept all the information needed to render the termbox in the call from mediawiki. So we should have something like: Mediawiki makes a POST request to SSR, sending the entity data

[Wikidata-bugs] [Maniphest] [Commented On] T212189: New Service Request: Wikidata Termbox SSR

2018-12-19 Thread Joe
Joe added a comment. In T212189#4833482, @daniel wrote: I agree with Joe that it would be better to have the service be internal, and be called from MW. It doesn't have to be that way, but it's preferable because: we would not expose a new endpoint we should in general avoid (more) services

[Wikidata-bugs] [Maniphest] [Commented On] T212189: New Service Request: Wikidata Termbox SSR

2018-12-18 Thread Joe
Joe added a comment. In T212189#4831959, @mobrovac wrote: In T212189#4831314, @daniel wrote: @mobrovac Please note that the term box is shown based on user preferences (languages spoken), the initially served DOM however needs to be the same for all users, so it can be cached. Also note

[Wikidata-bugs] [Maniphest] [Commented On] T212189: New Service Request: Wikidata Termbox SSR

2018-12-18 Thread Joe
Joe added a comment. Also: it is stated in https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service that "In case of no configured server-side rendering service or a malfunctioning of it, the client-side code will act as a fallback". This is a bit the other way around with respe

[Wikidata-bugs] [Maniphest] [Commented On] T212189: New Service Request: Wikidata Termbox SSR

2018-12-18 Thread Joe
Joe added a comment. Looking at the attached diagrams, it seems that the flow of a request is as follows: page gets requested to MediaWiki MW sends a request to the rendering service the rendering service sends request(s) to mediawiki via api.php to fetch the data, and sends back the rendered

[Wikidata-bugs] [Maniphest] [Updated] T125976: Run mediawiki::maintenance scripts in Beta Cluster

2018-12-16 Thread Joe
Joe added a project: serviceops. TASK DETAILhttps://phabricator.wikimedia.org/T125976EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JoeCc: gerritbot, Joe, Jdforrester-WMF, Krinkle, Dzahn, Reedy, MarcoAurelio, dcausse, Addshore, thcipriani, hashar, greg

[Wikidata-bugs] [Maniphest] [Commented On] T205865: Investigate decrease in wikidata dispatch times due to eqiad -> codfw DC switch

2018-10-01 Thread Joe
Joe added a comment. In T205865#4630573, @Joe wrote: Have you checked if the latest changes didn't just switch execution from HHVM to PHP7? That could explain a better performance. I can answer this: no, they did not. We launch mwscript for the dispatcher with PHP='hhvm -vEval.Jit=1' Also, can

[Wikidata-bugs] [Maniphest] [Commented On] T205865: Investigate decrease in wikidata dispatch times due to eqiad -> codfw DC switch

2018-10-01 Thread Joe
Joe added a comment. Have you checked if the latest changes didn't just switch execution from HHVM to PHP7? That could explain a better performance. Also, can I ask which redis servers are interacted with? I guess the ones for the locking system, right?TASK DETAILhttps://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T188045: wdqs1004 broken

2018-02-28 Thread Joe
Joe added a comment. In T188045#4007098, @Smalyshev wrote: I wonder if it's possible to use one of the new servers we're getting in T187766 to restore full capacity if debugging what is going on with 1004 takes time. Would it be a good thing to do? If losing one server out of 4 is an issue

[Wikidata-bugs] [Maniphest] [Updated] T178810: Wikibase: Increase batch size for HTMLCacheUpdateJobs triggered by repo changes.

2017-10-23 Thread Joe
Joe added a project: User-Joe. TASK DETAILhttps://phabricator.wikimedia.org/T178810EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JoeCc: Aklapper, aude, thiemowmde, hoo, Ladsgroup, Krinkle, Joe, daniel, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, Zppix

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-29 Thread Joe
Joe added a comment. oblivian@terbium:~$ /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/group1.dblist showJobs.php --group | awk '{if ($3 > 1) print $_}' cawiki: refreshLinks: 104355 queued; 3 claimed (3 active, 0 abandoned); 0 delayed commonswiki: refreshLinks: 2073193 queued;

[Wikidata-bugs] [Maniphest] [Commented On] T176312: Don’t check format constraint via SPARQL (safely evaluating user-provided regular expressions)

2017-09-25 Thread Joe
Joe added a comment. I think re2 seems like an interesting candidate. I would argue we still want to have a separate microservice running on a separate cluster from MediaWiki, for security reasons, and I would think it could be used to run the regular _expression_ validations as well. AIUI

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-19 Thread Joe
Joe added a comment. FWIW we're seeing another almost-incontrollable growth of jobs on commons and probably other wikis. I might decide to raise the concurrency of those jobs.TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-07 Thread Joe
Joe added a comment. I did some more number crunching on the instances of runJob.php I'm running on terbium, I found what follows: Wikibase refreshlinks jobs might benefit from being in smaller batches, as many of those are taking a long time to execute. Out of 33.4k wikibase jobs, we had

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-06 Thread Joe
Joe added a comment. In T173710#3584505, @Krinkle wrote: In T173710#3583445, @Joe wrote: As a side comment: this is one of the cases where I would've loved to have an elastic environment to run MediaWiki-related applications: I could've spun up 10 instances of jobrunner dedicated to refreshlinks

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-06 Thread Joe
Joe added a comment. In T173710#3581849, @aaron wrote: Those refreshLInks jobs (from wikibase) are the only ones that use multiple titles per job, so they will be a lot slower (seems to be 50 pages/job) than the regular ones from MediaWiki core. That is a bit on the slow side for a run time

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-05 Thread Joe
Joe added a comment. We still have around 1.4 million items in queue for commons, evenly divided between htmlCacheUpdate jobs and refreshLinks jobs. I've started a few runs of the refreshLinks job and since yesterday most jobs are just processing the same root job from August 26th. Those jobs

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T173710: Job queue is increasing non-stop

2017-08-31 Thread Joe
Joe added a subscriber: ema.Joe added a comment. Correcting myself after a discussion with @ema: since we have up to 4 cache layers (at most), we should process any job with a root timestamp newer than 4 times the cache TTL cap. So anything older than 4 days should be safely discardable

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-31 Thread Joe
Joe added a comment. @aaron so you're saying that when we have someone editing a lot of pages with a lot of backlinks we will see the jobqueue growing basically for quite a long time, as the divided jobs will be executed at a later time, and as long as the queue is long enough, we'll see jobs

  1   2   3   >