Re: [Toolserver-l] Status of the toolserver

2013-05-14 Thread Alex Brollo
Just a flash feedback - some ours again I could login again, but qstat gave
an error message while crontab was running regularly; now qstat runs again.

Presently is running under Alebot account a IRC script only, that can be
considered a test routine; have I to stop it, to make server update easier?

Alex


2013/5/13 DaB. w...@daniel.baur4.info

 Hello all,

 as you have surely noticed the toolserver is even more unstable and
 unreliable
 than normal at the moment. The reason is that our ha-nodes are not longer
 working as intended and neither Nosy nor I are able to fix this.

 A quick word was ha-nodes are: The ha stands for high available and we
 have 2 servers for that. Some services at the toolserver are so important
 that
 a downtime is unacceptable (like /home, LDAP or the DNS) and for this
 reasons
 these services life at the ha-nodes. If one server goes down or crashes
 then
 the other can continue to operate all services with no or little
 interruption
 time and without working by a root. That worked great as long as River was
 here and not-so-good in the last months, but now it is totally broken.
 The problem is that both ha-nodes run Solaris and all roots are no Solaris-
 experts what makes it hard for us to find errors or in this case
 impossible. We
 have setup a very ugly workaround, but it is not stable and so the
 downtime of
 important services cause downtime for the hole toolserver – and more work
 for
 the roots.

 We can only think of one solution: Replacing the solaris at the ha-nodes
 with
 linux. But this can not start before Friday and it will take some time
 until
 everything is moved over. It will also cause some hours of complete
 downtime
 while /home is copied (we will separately announce this). In best case when
 Whitsun is over everything will be working again, in worst case it will
 need 2
 weeks (I will be away between 21 and 26 for the general meeting of WMDE).
 The repairing of the ha-nodes has top priority, so everything else will be
 delayed (linux-update, database-reimports, account-creation (for VERY
 important ones send me a mail), etc.).

 If you have questions, please send them to the ML.

 Sincerely,
 DaB.

 --
 Userpage: [[:w:de:User:DaB.]] — PGP: 0x2d3ee2d42b255885

 ___
 Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
 https://lists.wikimedia.org/mailman/listinfo/toolserver-l
 Posting guidelines for this list:
 https://wiki.toolserver.org/view/Mailing_list_etiquette

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

[Toolserver-l] weird qcronsub errors (was: Output from cron command)

2013-05-14 Thread Peter Körner

Hi

Since a few days I'm getting weird errors when submitting tasks.

My Cronjob calls 
/home/mazder/public_html/replicate-sequences/update-submit.sh which 
conains the following command:


qcronsub -l h_rt=0:05:00 -l virtual_free=100M -l arch=* -l sql-user-m=1 
-N mazder-replicate-sequences -m as -o 
'/home/mazder/public_html/replicate-sequences/sge' 
'/home/mazder/public_html/replicate-sequences/update-run.sh'


Most of these calls produce the error below, which seems not to be an 
error in my code as I neither use xml nor python.


Do you have any Idea what's going wrong?
Peter


 Original-Nachricht 
Betreff: Output from cron command
Datum: Tue, 14 May 2013 08:40:00 + (UTC)
Von: maz...@toolserver.org (mazder)
An: maz...@toolserver.org

Your cron job on clematis
/home/mazder/public_html/replicate-sequences/update-submit.sh

produced the following output:

error: JSV stderr: Traceback (most recent call last):
error: JSV stderr: File /sge/GE/bin/sol-amd64/qjobtest, line 108, in 
module

error: JSV stderr: dom = minidom.parse(child_stdout)
error: JSV stderr: File 
/opt/ts/python/2.7/lib/python2.7/site-packages/_xmlplus/dom/minidom.py, line 
1915, in parse

error: JSV stderr: return expatbuilder.parse(file)
error: JSV stderr: File 
/opt/ts/python/2.7/lib/python2.7/site-packages/_xmlplus/dom/expatbuilder.py, 
line 930, in parse

error: JSV stderr: result = builder.parseFile(file)
error: JSV stderr: File 
/opt/ts/python/2.7/lib/python2.7/site-packages/_xmlplus/dom/expatbuilder.py, 
line 207, in parseFile

error: JSV stderr: parser.Parse(buffer, 0)
error: JSV stderr: xml.parsers.expat.ExpatError: syntax error: line 1, 
column 0

Unable to run job: JSV stderr: Traceback (most recent call last):
JSV stderr: File /sge/GE/bin/sol-amd64/qjobtest, line 108, in module
JSV stderr: dom = minidom.parse(child_stdout)
JSV stderr: File 
/opt/ts/python/2.7/lib/python2.7/site-packages/_xmlplus/dom/minidom.py, line 
1915, in parse

JSV stderr: return expatbuilder.parse(file)
JSV stderr: File 
/opt/ts/python/2.7/lib/python2.7/site-packages/_xmlplus/dom/expatbuilder.py, 
line 930, in parse

JSV stderr: result = builder.parseFile(file)
JSV stderr: File 
/opt/ts/python/2.7/lib/python2.7/site-packages/_xmlplus/dom/expatbuilder.py, 
line 207, in parseFile

JSV stderr: parser.Parse(buffer, 0)
JSV stderr: xml.parsers.expat.ExpatError: syntax error: line 1, column 0
JSV stderr is - xml.parsers.expat.ExpatError: syntax error: line 1, 
column 0.

Exiting.




___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] weird qcronsub errors

2013-05-14 Thread Tim Landscheidt
Peter Körner osm-li...@mazdermind.de wrote:

 Since a few days I'm getting weird errors when submitting tasks.

 My Cronjob calls
 /home/mazder/public_html/replicate-sequences/update-submit.sh
 which conains the following command:

 qcronsub -l h_rt=0:05:00 -l virtual_free=100M -l arch=* -l
 sql-user-m=1 -N mazder-replicate-sequences -m as -o
 '/home/mazder/public_html/replicate-sequences/sge'
 /home/mazder/public_html/replicate-sequences/update-run.sh'

 Most of these calls produce the error below, which seems not
 to be an error in my code as I neither use xml nor python.

 Do you have any Idea what's going wrong?

 [...]

An educated guess: The Python errors come from the script
/sge/GE/bin/sol-amd64/qjobtest that is called as part of
qcronsub to test whether a job with that name is already
running.  qjobtest parses the output of qstat -xml ...
which in normal operation returns a valid XML document.  My
assumption is that when SGE is down, qstat returns the error
messages (error: commlib error: can't connect to service
(Connection refused), etc.) as plain text which can't be
parsed as XML which in return causes qjobtest to barf.

In short: This is another artefact of SGE being down at that
moment, you can't do anything about it, just ignore.

Tim


___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Status of the toolserver

2013-05-14 Thread Russell Blau
On Mon, May 13, 2013, at 05:01 PM, DaB. wrote:
 The repairing of the ha-nodes has top priority, so everything else will
 be delayed (linux-update, database-reimports, account-creation (for VERY 
 important ones send me a mail), etc.).
 
 If you have questions, please send them to the ML.

Is the current outage of replication on sql-s1-user (now approaching 48
hours) related to this ha-node problem?  At least some other dbs seem to
still have replication working.

-- 
  Russell Blau
  russb...@imapmail.org

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Status of the toolserver

2013-05-14 Thread Patricia Pintilie
Linux is your best bet. Also Errors 404  401 are non responsive. I can
connect to all servers but on 2 of them msg/nickserver/password is the 401
 404 error stub. See if this information helps you if not write me back
Best Regards [MILASTARX]:[TS]
On May 13, 2013 6:02 PM, DaB. w...@daniel.baur4.info wrote:

 Hello all,

 as you have surely noticed the toolserver is even more unstable and
 unreliable
 than normal at the moment. The reason is that our ha-nodes are not longer
 working as intended and neither Nosy nor I are able to fix this.

 A quick word was ha-nodes are: The ha stands for high available and we
 have 2 servers for that. Some services at the toolserver are so important
 that
 a downtime is unacceptable (like /home, LDAP or the DNS) and for this
 reasons
 these services life at the ha-nodes. If one server goes down or crashes
 then
 the other can continue to operate all services with no or little
 interruption
 time and without working by a root. That worked great as long as River was
 here and not-so-good in the last months, but now it is totally broken.
 The problem is that both ha-nodes run Solaris and all roots are no Solaris-
 experts what makes it hard for us to find errors or in this case
 impossible. We
 have setup a very ugly workaround, but it is not stable and so the
 downtime of
 important services cause downtime for the hole toolserver – and more work
 for
 the roots.

 We can only think of one solution: Replacing the solaris at the ha-nodes
 with
 linux. But this can not start before Friday and it will take some time
 until
 everything is moved over. It will also cause some hours of complete
 downtime
 while /home is copied (we will separately announce this). In best case when
 Whitsun is over everything will be working again, in worst case it will
 need 2
 weeks (I will be away between 21 and 26 for the general meeting of WMDE).
 The repairing of the ha-nodes has top priority, so everything else will be
 delayed (linux-update, database-reimports, account-creation (for VERY
 important ones send me a mail), etc.).

 If you have questions, please send them to the ML.

 Sincerely,
 DaB.

 --
 Userpage: [[:w:de:User:DaB.]] — PGP: 0x2d3ee2d42b255885

 ___
 Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
 https://lists.wikimedia.org/mailman/listinfo/toolserver-l
 Posting guidelines for this list:
 https://wiki.toolserver.org/view/Mailing_list_etiquette

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette