Re: [Toolserver-l] SGE queue waiting forever?

2012-11-24 Thread Platonides
On 24/11/12 21:38, Dr. Trigon wrote:
>> @All: If you are working on big files please copy them to local 
>> temp first (on sge $TMP contains an individual temp dir for the 
>> job). E.g. piping big files to other slow programs causes much
>> nfs load because data must be read in small packages which cause
>> high load on servers. That's why sge cannot schedule new jobs on 
>> nightshade since days.
> 
> What is a big file? Is it ok if the file is in user-home?
> 
> Thanks and greetings DrTrigon

/home is also mounted with nfs

Although it's strange that reading from big files overloads the
servers. stdio or the equivalent functionality in the language they
are made should be making it work in blocks.

Looking at willow mounts, /shared and /home are mounted with nfsv3
over udp. But /mnt/user-store and /install don't show it, so they are
probably using nfsv4 over tcp. Is that intended?



___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] SGE queue waiting forever?

2012-11-24 Thread Dr. Trigon
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 24.11.2012 21:15, Merlissimo wrote:
> At 20:32 on Nov 23th sge on turnera stopped and was started at
> damiana. The qmaster thread started successfully because it
> responses pings and so on. But the scheduler thread seems not to
> work. qconf -tsm does not show any status information (which whould
> be written to logs when is send this command). That's why no new
> jobs are send to execution clients.
> 
> So the switch over on the ha-cluster failed.

...so is it supposed to be working now...?

> @All: If you are working on big files please copy them to local
> temp first (on sge $TMP contains an individual temp dir for the
> job). E.g. piping big files to other slow programs causes much nfs
> load because data must be read in small packages which cause high
> load on servers. That's why sge cannot schedule new jobs on
> nightshade since days.

What is a big file? Is it ok if the file is in user-home?

Thanks and greetings
DrTrigon
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlCxMD8ACgkQAXWvBxzBrDCONACgyIeN8vDFAtJUcp//VXObBru0
EWEAoNXWUfHYjBKGa9DD6I/1mOh6mPI6
=Sl0r
-END PGP SIGNATURE-

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] SGE queue waiting forever?

2012-11-24 Thread Merlissimo

Am 24.11.2012 20:43, schrieb Marlen Caemmerer:

Hello,

a broken nfs mount was the source of the slow login.
Dont know if it affected SGE as well but I tried to mount the user-store
and I got the error "Out of stream resources".
There might be something fishy with the local disks too since cat
/etc/vfstab took ages 2 times and ls resulted in "no such file or
directory" twice too.
But ipmi logs and the raid utility from solaris showed no faults.
I rebooted and the system now seems to be running ok.
Do you still see any issue?

Cheers
 nosy



At 20:32 on Nov 23th sge on turnera stopped and was started at damiana. 
The qmaster thread started successfully because it responses pings and 
so on. But the scheduler thread seems not to work. qconf -tsm does not 
show any status information (which whould be written to logs when is 
send this command). That's why no new jobs are send to execution clients.


So the switch over on the ha-cluster failed.

Merlissimo

@All: If you are working on big files please copy them to local temp 
first (on sge $TMP contains an individual temp dir for the job). E.g. 
piping big files to other slow programs causes much nfs load because 
data must be read in small packages which cause high load on servers. 
That's why sge cannot schedule new jobs on nightshade since days.


___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] SGE queue waiting forever?

2012-11-24 Thread Marlen Caemmerer

Hello,

a broken nfs mount was the source of the slow login.
Dont know if it affected SGE as well but I tried to mount the user-store and I got the 
error "Out of stream resources".
There might be something fishy with the local disks too since cat /etc/vfstab took ages 2 
times and ls resulted in "no such file or directory" twice too.
But ipmi logs and the raid utility from solaris showed no faults.
I rebooted and the system now seems to be running ok.
Do you still see any issue?

Cheers
nosy

On Sat, 24 Nov 2012, Wolfgang Faust wrote:


Date: Sat, 24 Nov 2012 17:37:31
From: Wolfgang Faust 
Reply-To: Wikimedia Toolserver 
To: Wikimedia Toolserver 
Subject: Re: [Toolserver-l] SGE queue waiting forever?

Logging in to submit.toolserver.org takes a really long time recently
(starting a few days ago). Clematis doesn't seem to have any load though,
so I don't know what's going on.


On Sat, Nov 24, 2012 at 10:17 AM, Dr. Trigon  wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello

I have an issue with my jobs not executed or more precise queued since
midnight:

job-ID  prior   name   user state submit/start at
queue  slots ja-task-ID
-

-
 786288 0.50539 ircbot drtrigon r 11/17/2012 00:18:04
longrun-sol@willow.toolserver. 1
 825178 0.50042 subster_me drtrigon qw11/24/2012 00:06:04
   1
 825207 0.50039 subster_en drtrigon qw11/24/2012 01:06:03
   1
 825212 0.50038 subster_nl drtrigon qw11/24/2012 01:36:03
   1
 825228 0.50037 mainbotdrtrigon qw11/24/2012 02:36:03
   1
 825106 0.50035 subster_ar drtrigon qw11/23/2012 21:06:03
   1
 825177 0.5 maintenanc drtrigon qw11/24/2012 00:06:04
   1
 825191 0.0 subster_fr drtrigon qw11/24/2012 00:36:04
   1

...what could be the issue here?

Thanks and greetings
DrTrigon
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlCw5PUACgkQAXWvBxzBrDA6IQCeNbMv7m11Pan5gJrrILATo3q6
m4EAnRX9gZR9uDi7nFSlywJlLzWOEhin
=gLQA
-END PGP SIGNATURE-

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list:
https://wiki.toolserver.org/view/Mailing_list_etiquette









___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] SGE queue waiting forever?

2012-11-24 Thread Platonides
On 24/11/12 17:37, Wolfgang Faust wrote:
> Logging in to submit.toolserver.org  takes
> a really long time recently (starting a few days ago). Clematis doesn't
> seem to have any load though, so I don't know what's going on.

Thousands of processes running df -k /mnt/user-store? :)

$ ssh submit "ps -ef | grep -c 'df -k /mnt/user-store'"
15408

Seems we have /mnt/user-store problems again.
clematis ~ $ time ls /mnt/user-store
NFS server thyme not responding still trying
NFS getattr failed for server thyme: error 16 (RPC: Failed (unspecified
error))
ls: cannot access /mnt/user-store: Connection timed out

But those instances are stuck. There are processes since Nov 12.
Seems that nosy just killed them.



___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] SGE queue waiting forever?

2012-11-24 Thread Wolfgang Faust
Logging in to submit.toolserver.org takes a really long time recently
(starting a few days ago). Clematis doesn't seem to have any load though,
so I don't know what's going on.


On Sat, Nov 24, 2012 at 10:17 AM, Dr. Trigon  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Hello
>
> I have an issue with my jobs not executed or more precise queued since
> midnight:
>
> job-ID  prior   name   user state submit/start at
> queue  slots ja-task-ID
> -
>
> -
>  786288 0.50539 ircbot drtrigon r 11/17/2012 00:18:04
> longrun-sol@willow.toolserver. 1
>  825178 0.50042 subster_me drtrigon qw11/24/2012 00:06:04
>1
>  825207 0.50039 subster_en drtrigon qw11/24/2012 01:06:03
>1
>  825212 0.50038 subster_nl drtrigon qw11/24/2012 01:36:03
>1
>  825228 0.50037 mainbotdrtrigon qw11/24/2012 02:36:03
>1
>  825106 0.50035 subster_ar drtrigon qw11/23/2012 21:06:03
>1
>  825177 0.5 maintenanc drtrigon qw11/24/2012 00:06:04
>1
>  825191 0.0 subster_fr drtrigon qw11/24/2012 00:36:04
>1
>
> ...what could be the issue here?
>
> Thanks and greetings
> DrTrigon
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.12 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://www.enigmail.net/
>
> iEYEARECAAYFAlCw5PUACgkQAXWvBxzBrDA6IQCeNbMv7m11Pan5gJrrILATo3q6
> m4EAnRX9gZR9uDi7nFSlywJlLzWOEhin
> =gLQA
> -END PGP SIGNATURE-
>
> ___
> Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
> https://lists.wikimedia.org/mailman/listinfo/toolserver-l
> Posting guidelines for this list:
> https://wiki.toolserver.org/view/Mailing_list_etiquette
>



-- 
This message has been encoded in 128ROT13 for security. If you are unable
to view it, please consult an optometrist.
___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

[Toolserver-l] SGE queue waiting forever?

2012-11-24 Thread Dr. Trigon
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello

I have an issue with my jobs not executed or more precise queued since
midnight:

job-ID  prior   name   user state submit/start at
queue  slots ja-task-ID
-
-
 786288 0.50539 ircbot drtrigon r 11/17/2012 00:18:04
longrun-sol@willow.toolserver. 1
 825178 0.50042 subster_me drtrigon qw11/24/2012 00:06:04
   1
 825207 0.50039 subster_en drtrigon qw11/24/2012 01:06:03
   1
 825212 0.50038 subster_nl drtrigon qw11/24/2012 01:36:03
   1
 825228 0.50037 mainbotdrtrigon qw11/24/2012 02:36:03
   1
 825106 0.50035 subster_ar drtrigon qw11/23/2012 21:06:03
   1
 825177 0.5 maintenanc drtrigon qw11/24/2012 00:06:04
   1
 825191 0.0 subster_fr drtrigon qw11/24/2012 00:36:04
   1

...what could be the issue here?

Thanks and greetings
DrTrigon
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlCw5PUACgkQAXWvBxzBrDA6IQCeNbMv7m11Pan5gJrrILATo3q6
m4EAnRX9gZR9uDi7nFSlywJlLzWOEhin
=gLQA
-END PGP SIGNATURE-

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] JIRA session loss

2012-11-24 Thread Dr. Trigon
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

+ 1(the same for me)

Greetings
DrTrigon


On 23.11.2012 20:46, Tim Landscheidt wrote:
> (anonymous) wrote:
> 
>>> At some point I even logged in, clicked an issue, clicked Edit
>>> (which uses AJAX) and then the Edit screen wouldn't load due to
>>> me not being authenticated (while I still saw my nickname on
>>> the top right).
> 
>> I have the same problem lately when I use JIRA. The session loss 
>> happens intermittently, though, so if I just try the same edit
>> over and over it eventually works.  At least, it has always
>> worked when I tried it.
> 
> It doesn't work for me since at least early October, but it never
> healed itself - I always had to login again (after copying the
> comment I was about to enter and trying to re- member what else I
> wanted to change :-)).
> 
> Tim
> 
> 
> ___ Toolserver-l
> mailing list (Toolserver-l@lists.wikimedia.org) 
> https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting
> guidelines for this list:
> https://wiki.toolserver.org/view/Mailing_list_etiquette
> 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlCw48sACgkQAXWvBxzBrDCB/ACg3CJmMqTh6LmDHc+Znjk0P6Yb
jiYAniiPUEtS8deuWRU9rE6Pw/k78OxW
=WASL
-END PGP SIGNATURE-

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette