[Toolserver-l] Less time this week and slower sql-s5-user

2013-02-26 Thread DaB.
Hello all,

as you may notice I was not online yesterday and today. The reason is that I 
have way more to do in real-life at the moment and a flu is visiting my family 
at the moment. For these reasons I will not be online as much as normal this 
week (maybe it will get better at the weekend). If something VERY urgent 
happens please send me a mail and I will look at it when I find time.
As you also may noticed is that sql-s5-user is slower than normal. The reason 
is simple: I import commons in parallel threads to have it available as soon 
as possible. If you need a fast and not much behind copy of s5 for READING use 
sql-s5-rr (you should ALWAYS use that or dewiki-p.rrdb.toolserver.org for 
reading).

Hope to see you soon.

Sincerely,
DaB.


-- 
Userpage: [[:w:de:User:DaB.]] — PGP: 0x2d3ee2d42b255885


signature.asc
Description: This is a digitally signed message part.
___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Web services and SSH down?

2013-02-26 Thread Johannes Kroll
On Tue, 26 Feb 2013 12:54:27 +0100
Johannes Kroll  wrote:

> On Tue, 26 Feb 2013 05:19:32 -0600
> legoktm  wrote:
> 
> > On Tue, Feb 26, 2013 at 4:38 AM, Marlen Caemmerer <
> > marlen.caemme...@wikimedia.de> wrote:
> > 
> > > On Tue, 26 Feb 2013, Johannes Kroll wrote:
> > >
> > >
> > >> While trying to load 
> > >> http://toolserver.org/~render/**stools/tlg,
> > >> we got
> > >> 500 errors first and then "connection reset". SSH to nightshade took 2
> > >> minutes or so to connect. Now web & ssh seems to be working again.
> > >>
> > > At which time did you try about?
> > 
> > On IRC myself and jem- reported having issues at around 10:10am UTC and it
> > recovered around 10:14am UTC.
> > As of 11:08am UTC I cannot ssh in, and phe was getting 404s.
> > tsbot and tsnag also left the channel at 11:05am UTC after timing out.
> 
> Curious: now 'ps aux' on nightshade hangs after displaying some 30 or so
> processes. 
> 
> Some lines from strace when it was hanging:
> 
> connect(6, {sa_family=AF_INET, sin_port=htons(53), 
> sin_addr=inet_addr("10.24.1.18")}, 16) = 0
> poll([{fd=6, events=POLLOUT}], 1, 0)= 1 ([{fd=6, revents=POLLOUT}])
> sendto(6, "\3353\1\0\0\1\0\0\0\0\0\0\4ldap\3esi\ntoolserver"..., 41, 
> MSG_NOSIGNAL, NULL, 0) = 41
> poll([{fd=6, events=POLLIN|POLLOUT}], 1, 5000) = 1 ([{fd=6, revents=POLLOUT}])
> sendto(6, "\23I\1\0\0\1\0\0\0\0\0\0\4ldap\3esi\ntoolserver"..., 41, 
> MSG_NOSIGNAL, NULL, 0) = 41
> poll([{fd=6, events=POLLIN}], 1, 4999)  = 0 (Timeout)
> 
> Port 53 is DNS? So it looks like some DNS query timed out?

If DNS drops out from time to time, could that explain the problems we
see? Even rsync failed for me at one point, in addition to the web
and ssh stuff. 

Which machine has address 10.24.1.18? Why would it be down or
unreachable?



___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Web services and SSH down?

2013-02-26 Thread Johannes Kroll
On Tue, 26 Feb 2013 05:19:32 -0600
legoktm  wrote:

> On Tue, Feb 26, 2013 at 4:38 AM, Marlen Caemmerer <
> marlen.caemme...@wikimedia.de> wrote:
> 
> > On Tue, 26 Feb 2013, Johannes Kroll wrote:
> >
> >
> >> While trying to load 
> >> http://toolserver.org/~render/**stools/tlg,
> >> we got
> >> 500 errors first and then "connection reset". SSH to nightshade took 2
> >> minutes or so to connect. Now web & ssh seems to be working again.
> >>
> > At which time did you try about?
> 
> On IRC myself and jem- reported having issues at around 10:10am UTC and it
> recovered around 10:14am UTC.
> As of 11:08am UTC I cannot ssh in, and phe was getting 404s.
> tsbot and tsnag also left the channel at 11:05am UTC after timing out.

Curious: now 'ps aux' on nightshade hangs after displaying some 30 or so
processes. 

Some lines from strace when it was hanging:

connect(6, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("10.24.1.18")}, 16) = 0
poll([{fd=6, events=POLLOUT}], 1, 0)= 1 ([{fd=6, revents=POLLOUT}])
sendto(6, "\3353\1\0\0\1\0\0\0\0\0\0\4ldap\3esi\ntoolserver"..., 41, 
MSG_NOSIGNAL, NULL, 0) = 41
poll([{fd=6, events=POLLIN|POLLOUT}], 1, 5000) = 1 ([{fd=6, revents=POLLOUT}])
sendto(6, "\23I\1\0\0\1\0\0\0\0\0\0\4ldap\3esi\ntoolserver"..., 41, 
MSG_NOSIGNAL, NULL, 0) = 41
poll([{fd=6, events=POLLIN}], 1, 4999)  = 0 (Timeout)

Port 53 is DNS? So it looks like some DNS query timed out?

Now it seems to be working again. I didn't log the whole strace run, but
I saved the lines that I still had in the terminal buffer... I can send
it if anybody needs it.



___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Web services and SSH down?

2013-02-26 Thread legoktm
On Tue, Feb 26, 2013 at 4:38 AM, Marlen Caemmerer <
marlen.caemme...@wikimedia.de> wrote:

> On Tue, 26 Feb 2013, Johannes Kroll wrote:
>
>
>> While trying to load 
>> http://toolserver.org/~render/**stools/tlg,
>> we got
>> 500 errors first and then "connection reset". SSH to nightshade took 2
>> minutes or so to connect. Now web & ssh seems to be working again.
>>
> At which time did you try about?

On IRC myself and jem- reported having issues at around 10:10am UTC and it
recovered around 10:14am UTC.
As of 11:08am UTC I cannot ssh in, and phe was getting 404s.
tsbot and tsnag also left the channel at 11:05am UTC after timing out.

>
> Cheers
> nosy
>
> --Legoktm
___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Web services and SSH down?

2013-02-26 Thread Marlen Caemmerer

On Tue, 26 Feb 2013, Johannes Kroll wrote:



While trying to load http://toolserver.org/~render/stools/tlg, we got
500 errors first and then "connection reset". SSH to nightshade took 2
minutes or so to connect. Now web & ssh seems to be working again. 


At which time did you try about?


Yesterday evening up till early in the morning today, SQL queries were
very slow. I did't take measurements but simple page queries that would
normally execute instantly would take minutes. 


Did you try the whole night? Or which time? And which databases seemed to 
answer slower?
The problem is that the head nodes are doing SQL forwarding too.
So if the active one is fishy you might not even have SQL connections.
But the phenomenon should have occured between about 0:30 and 1:30 am UTC (1:30 
and 2:30 CET).
If you tried outside of this timeframe it would be good to know if you had any 
other errors and what they looked like.

Cheers
nosy

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Web services and SSH down?

2013-02-26 Thread Johannes Kroll
On Mon, 25 Feb 2013 20:58:19 -0500
MZMcBride  wrote:

> DeltaQuad wrote:
> > They *just* came back up. Sorry for the spam all.
> 
> It's about 9 p.m. on Monday evening right now for me.
> https://toolserver.org/~mzmcbride/watcher/ and other similar URLs were
> 404ing for me yesterday (Sunday) evening. And then they suddenly started
> working again without explanation. It seems to be an intermittent issue.
> 
> Maybe it's related to the start of a new UTC day and load? Or maybe it's
> just an intermittent issue. Probably needs to be investigated if it
> continues to happen, though.

While trying to load http://toolserver.org/~render/stools/tlg, we got
500 errors first and then "connection reset". SSH to nightshade took 2
minutes or so to connect. Now web & ssh seems to be working again. 

Yesterday evening up till early in the morning today, SQL queries were
very slow. I did't take measurements but simple page queries that would
normally execute instantly would take minutes. 

Don't know if the two things are related.


___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette