subject:"\"\\\[Wikidata\\\-bugs\\\] \\\[Maniphest\\\] \\\[Commented On\\\] T179156\\\: 503 spikes and resulting API slowness starting 18\\\:45 October 26\""

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2018-01-29 Thread demon

demon added a comment. Is there anything left here, now that everything in the summary is done?TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: demonCc: Zoranzoki21, daniel, Peachey88, ema, Gehel, Smalyshev, T

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-11-22 Thread demon

demon added a comment. In T179156#3782516, @awight wrote: @BBlack Thanks for the detailed notes! All I was going to add was my understanding of how Ext:ORES has the potential for exacerbating any issues with the API layer, simply by consuming with every new edit. The extension has potential for

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-11-22 Thread Zoranzoki21

Zoranzoki21 added a comment. Does it made problem with high sleep times in pywiki?TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Zoranzoki21Cc: Zoranzoki21, daniel, Peachey88, ema, Gehel, Smalyshev, TerraCod

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-11-22 Thread awight

awight added a comment. @BBlack Thanks for the detailed notes! All I was going to add was my understanding of how Ext:ORES has the potential for exacerbating any issues with the API layer, simply by consuming with every new edit.TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-11-22 Thread BBlack

BBlack added a comment. No, we never made an incident rep on this one, and I don't think it would be fair at this time to implicate ORES as a cause. We can't really say that ORES was directly involved at all (or any of the other services investigated here). Because the cause was so unknown at th

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-11-22 Thread awight

awight added a comment. @hoo Wondering if you wrote an incident report, that I can add to with an explanation of ORES's involvement?TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: awightCc: daniel, Peachey88,

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-11-08 Thread gerritbot

gerritbot added a comment. Change 387236 merged by Ema: [operations/debs/varnish4@debian-wmf] Add local patch for transaction_timeout https://gerrit.wikimedia.org/r/387236TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpref

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-11-06 Thread gerritbot

gerritbot added a comment. Change 387228 merged by BBlack: [operations/puppet@production] cache_text: reduce inter-cache backend timeouts as well https://gerrit.wikimedia.org/r/387228TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/pan

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-11-06 Thread gerritbot

gerritbot added a comment. Change 387225 merged by BBlack: [operations/puppet@production] cache_text: reduce applayer timeouts to reasonable values https://gerrit.wikimedia.org/r/387225TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/p

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread BBlack

BBlack added a comment. In T179156#3720392, @BBlack wrote: In T179156#3719995, @BBlack wrote: We have an obvious case of normal slow chunked uploads of large files to commons to look at for examples to observe, though. Rewinding a little: this is false, I was just getting confused by terminolog

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread BBlack

BBlack added a comment. In T179156#3719995, @BBlack wrote: We have an obvious case of normal slow chunked uploads of large files to commons to look at for examples to observe, though. Rewinding a little: this is false, I was just getting confused by terminology. Commons "chunked" uploads throug

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread daniel

daniel added a comment. Because they're POST they'd be handled as an immediate pass through the varnish layers, so I don't think this would cause what we're looking at now. "pass" means stream, right? wouldn't that also grab a backend connection from the pool, and hog it if throughput is slow? We

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread BBlack

BBlack added a comment. In T179156#3719928, @daniel wrote: In any case, this would consume front-edge client connections, but wouldn't trigger anything deeper into the stack That's assuming varnish always caches the entire request, and never "streams" to the backend, even for file uploads. When d

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread daniel

daniel added a comment. In any case, this would consume front-edge client connections, but wouldn't trigger anything deeper into the stack That's assuming varnish always caches the entire request, and never "streams" to the backend, even for file uploads. When discussing this with @hoo he told me

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread BBlack

BBlack added a comment. Trickled-in POST on the client side would be something else. Varnish's timeout_idle, which is set to 5s on our frontends, acts as the limit for receiving all client request headers, but I'm not sure that it has such a limitation that applies to client-sent bodies. In any c

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread daniel

daniel added a comment. @BBlack wrote: something that's doing a legitimate request->response cycle, but trickling out the bytes of it over a very long period. That's a well known attack method. Could this be coming from the outside, trickling bits bytes of a post? Are we sure we are safe against

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread gerritbot

gerritbot added a comment. Change 387236 had a related patch set uploaded (by BBlack; owner: BBlack): [operations/debs/varnish4@debian-wmf] [WIP] backend transaction_timeout https://gerrit.wikimedia.org/r/387236TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricato

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread gerritbot

gerritbot added a comment. Change 387228 had a related patch set uploaded (by BBlack; owner: BBlack): [operations/puppet@production] cache_text: reduce inter-cache backend timeouts as well https://gerrit.wikimedia.org/r/387228TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShtt

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread gerritbot

gerritbot added a comment. Change 387225 had a related patch set uploaded (by BBlack; owner: BBlack): [operations/puppet@production] cache_text: reduce applayer timeouts to reasonable values https://gerrit.wikimedia.org/r/387225TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCESh

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread Lucas_Werkmeister_WMDE

Lucas_Werkmeister_WMDE added a comment. In T179156#3719057, @BBlack wrote: could other services on text-lb be making these kinds of queries to WDQS on behalf of the client and basically proxying the same behavior through? WikibaseQualityConstraints runs a limited set of queries, but none that co

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread gerritbot

gerritbot added a comment. Change 386824 merged by BBlack: [operations/puppet@production] Revert "cache_text: raise MW connection limits to 10K" https://gerrit.wikimedia.org/r/386824TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/pane

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread BBlack

BBlack added a comment. In T179156#3718772, @ema wrote: There's a timeout limiting the total amount of time varnish is allowed to spend on a single request, send_timeout, defaulting to 10 minutes. Unfortunately there's no counter tracking when the timer kicks in, although a debug line is logged t

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread Stashbot

Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-10-30T11:44:38Z] Synchronized wmf-config/Wikibase.php: Re-add property for RDF mapping of external identifiers for Wikidata (T179156, T178180) (duration: 00m 49s)TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREF

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread gerritbot

gerritbot added a comment. Change 387190 merged by jenkins-bot: [operations/mediawiki-config@master] Revert "Revert "Add property for RDF mapping of external identifiers for Wikidata"" https://gerrit.wikimedia.org/r/387190TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps:/

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread Stashbot

Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-10-30T11:33:14Z] Synchronized wmf-config/Wikibase-production.php: Re-enable constraints check with SPARQL (T179156) (duration: 00m 50s)TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.w

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread gerritbot

gerritbot added a comment. Change 387189 merged by jenkins-bot: [operations/mediawiki-config@master] Revert "Disable constraints check with SPARQL for now" https://gerrit.wikimedia.org/r/387189TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/se

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread Lucas_Werkmeister_WMDE

Lucas_Werkmeister_WMDE added a comment. The only live polling feature I can think of that was recently introduced is for the live updates to Special:RecentChanges. As far as I know, that feature just reloads the recent changes every few seconds with a new request. Another thing that might be simi

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread Stashbot

Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-10-30T11:15:55Z] rebuilt wikiversions.php and synchronized wikiversions files: Wikidatawiki back to wmf.5 (T179156)TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/setting

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread gerritbot

gerritbot added a comment. Change 387188 merged by jenkins-bot: [operations/mediawiki-config@master] Revert "Wikidatawiki to wmf.4" https://gerrit.wikimedia.org/r/387188TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailprefer

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread ema

ema added a comment. In T179156#3717895, @BBlack wrote: My best hypothesis for the "unreasonable" behavior that would break under do_stream=false is that we have some URI which is abusing HTTP chunked responses to stream an indefinite response. Sort of like websockets, but using the normal HTTP p

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread gerritbot

gerritbot added a comment. Change 387189 had a related patch set uploaded (by Hoo man; owner: Hoo man): [operations/mediawiki-config@master] Revert "Disable constraints check with SPARQL for now" https://gerrit.wikimedia.org/r/387189TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERE

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread gerritbot

gerritbot added a comment. Change 387190 had a related patch set uploaded (by Hoo man; owner: Hoo man): [operations/mediawiki-config@master] Revert "Revert "Add property for RDF mapping of external identifiers for Wikidata"" https://gerrit.wikimedia.org/r/387190TASK DETAILhttps://phabricator.wikim

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread gerritbot

gerritbot added a comment. Change 387188 had a related patch set uploaded (by Hoo man; owner: Hoo man): [operations/mediawiki-config@master] Revert "Wikidatawiki to wmf.4" https://gerrit.wikimedia.org/r/387188TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-29 Thread Legoktm

Legoktm added a comment. In T179156#3718221, @BBlack wrote: Does Echo have any kind of push notification going on, even in light testing yet? Nothing that's deployed AFAIK. The only live polling feature I can think of that was recently introduced is for the live updates to Special:RecentChanges.

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-29 Thread BBlack

BBlack added a comment. Does Echo have any kind of push notification going on, even in light testing yet?TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: ema, Gehel, Smalyshev, TerraCodes, Jay8g, Liu

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-29 Thread ema

ema added a comment. In T179156#3717847, @BBlack wrote: For future reference by another opsen who might be looking at this: one of the key metrics that identifies what we've been calling the "target cache" in eqiad, the one that will (eventually) have issues due to whatever bad traffic is currentl

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-28 Thread BBlack

BBlack added a comment. A while after the above, @hoo started focusing on a different aspect of this we've been somewhat ignoring as more of a side-symptom: that there tend to be a lot of sockets in a strange state on the "target" varnish, to various MW nodes. They look strange on both sides, in t

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-28 Thread BBlack

BBlack added a comment. Updates from the Varnish side of things today (since I've been bad about getting commits/logs tagged onto this ticket): 18:15 - I took over looking at today's outburst on the Varnish side The current target at the time was cp1053 (after elukey's earlier restart of cp1055 v

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-28 Thread Stashbot

Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-10-28T19:39:06Z] Synchronized wmf-config/CommonSettings.php: Half the Flow -> Parsoid timeout (100s -> 50s) (T179156) (duration: 00m 51s)TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-28 Thread Stashbot

Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-10-28T16:51:54Z] restart varnish backend on cp1055 - mailbox lag + T179156TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc:

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-28 Thread hoo

hoo added a comment. Also on mw1180: $ sudo -u www-data ss --tcp -r -p > ss $ cat ss | grep -c FIN-WAIT-2 16 $ cat ss | grep -c cp1055 18 $ cat ss | grep -v cp1055 | grep -c FIN-WAIT-2 0TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-28 Thread hoo

hoo added a comment. Happening again, this time on cp1055. Example from mw1180: $ ss --tcp -r | grep -oP 'cp\d+' | sort | uniq -c 2 cp1053 20 cp1055 2 cp1066 1 cp1068 Also: $ cat /tmp/apache_status.mw1180.1509206746.txt | grep 10.64.32.107 | wc -l 31 $ cat /tmp/apache_sta

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-27 Thread gerritbot

gerritbot added a comment. Change 386939 merged by BBlack: [operations/puppet@production] Varnish: puppetize per-backend between_bytes_timeout https://gerrit.wikimedia.org/r/386939TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-27 Thread gerritbot

gerritbot added a comment. Change 386939 had a related patch set uploaded (by BBlack; owner: BBlack): [operations/puppet@production] Varnish: puppetize per-backend between_bytes_timeout https://gerrit.wikimedia.org/r/386939TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps:

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-27 Thread Stashbot

Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-10-27T17:54:14Z] Taking mwdebug1001 to do tests regarding T179156TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: Stashbot,

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-27 Thread Stashbot

Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-10-27T15:50:58Z] Synchronized wmf-config/Wikibase-production.php: Disable constraints check with SPARQL for now (T179156) (duration: 00m 50s)TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabric

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-27 Thread gerritbot

gerritbot added a comment. Change 386833 merged by jenkins-bot: [operations/mediawiki-config@master] Disable constraints check with SPARQL for now https://gerrit.wikimedia.org/r/386833TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/pa

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-27 Thread Lucas_Werkmeister_WMDE

Lucas_Werkmeister_WMDE added a comment. (Permalink: https://grafana.wikimedia.org/dashboard/db/wikidata-quality?panelId=10&fullscreen&orgId=1&from=now-2d&to=now) Slightly more permanent link, I think: https://grafana.wikimedia.org/dashboard/db/wikidata-quality?panelId=10&fullscreen&orgId=1&from=15

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-27 Thread gerritbot

gerritbot added a comment. Change 386833 had a related patch set uploaded (by Hoo man; owner: Hoo man): [operations/mediawiki-config@master] Disable constraints check with SPARQL for now https://gerrit.wikimedia.org/r/386833TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-27 Thread hoo

hoo added a comment. In T179156#3715446, @BBlack wrote: In T179156#3715432, @hoo wrote: I think I found the root cuase now, seems it's actually related to the WikibaseQualityConstraints extension: Isn't that the same extension referenced in the suspect commits mentioned above? 18:51 ladsgroup

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-27 Thread BBlack

BBlack added a comment. In T179156#3715432, @hoo wrote: I think I found the root cuase now, seems it's actually related to the WikibaseQualityConstraints extension: Isn't that the same extension referenced in the suspect commits mentioned above? 18:51 ladsgroup@tin: Synchronized php-1.31.0-w

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-27 Thread gerritbot

gerritbot added a comment. Change 386824 had a related patch set uploaded (by BBlack; owner: BBlack): [operations/puppet@production] Revert "cache_text: raise MW connection limits to 10K" https://gerrit.wikimedia.org/r/386824TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttp

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-27 Thread BBlack

BBlack added a comment. Unless anyone objects, I'd like to start with reverting our emergency varnish max_connections changes from https://gerrit.wikimedia.org/r/#/c/386756 . Since the end of the log above, connection counts have returned to normal, which is ~100, which is 1/10th the normal 1K lim

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-27 Thread BBlack

BBlack added a comment. My gut instinct remains what it was at the end of the log above. I think something in the revert of wikidatawiki to wmf.4 fixed this. And I think given the timing alignment of the Fix sorting of NullResults changes + the initial ORES->wikidata fatals makes those in particu

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-27 Thread BBlack

BBlack added a comment. Copying this in from etherpad (this is less awful than 6 hours of raw IRC+SAL logs, but still pretty verbose): # cache servers work ongoing here, ethtool changes that require short depooled downtimes around short ethernet port outages: 17:49 bblack: ulsfo cp servers: rollin

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-27 Thread Marostegui

Marostegui added a comment. From those two masters's (s4 and s5) graphs, we can see that whatever happened, happened exactly at the same time on both servers, so it is unlikely the databases are the cause, but we are just seeing the consequences.TASK DETAILhttps://phabricator.wikimedia.org/T179156E

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-27 Thread hoo

hoo added a comment. This has some potentially interesting patterns: watchlist, recentchanges, contributions, logpager replicas at that time: s4: https://grafana.wikimedia.org/dashboard/db/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1053&var-port=9104&from=1509043675617&to=150906167

57 matches

Mail list logo