Hi,
So it surely is not the Java version or the shared file system. Did you consider a hardware problem? Bad/flaky memory can many times have random adverse effects. Double checking that the memory and hard disk don't have any errors shouldn't be hard. Also, make sure the processors are not overheating. Outside of that, only the logs will pinpoint the exact place of failure, and maybe only Java experts might be able to decode the log files. Just to make sure its not a bug in the JVM, did you try Java 1.6 32 bit version, or Java 1.5, or even 1.4? If you get similar problems across different versions of the JVM, then it most likely points into the direction of bad hardware. If not, then it could be a problem in the JVM.

Ioan

Samatha Kottha wrote:
Hi Ioan,

No, it is an AMD machine (Opteron). Java nd GT4 are on local disk. Of
course, I checked that log file and it has some of the logs I posted in
my previous mail at the starting and some numerical data about heap size
and dynamic libraries which did not provide any information. I do not
think either it is problem with Java, because we have a twin machine
with the same architecture, OS, GT4, Java version, and every other bit
software (we made them very identical because we want to use one as back
up of the other for data management). There the container did not crash
not even once. It worked fine and the problem started only yesterday
morning. That is what puzzling me.

Cheers,
Samatha

Ioan Raicu wrote:
This is the kind of error I see when I run a IA32 JVM on a IA64
architecture.  I assume you are not running this Java 1.6 on an IA64
machine, and it is in fact a x64 machine, right?  What about where is
Java and GT4, on NFS, or local disk?  Perhaps if NFS is having
corruption/network issues, it might cause weird problems like this. If you are using NFS, I'd give the local disk a try, for both Java and
GT4.  If neither of these are the problem, then look up what the log
"hs_err_pid18976.log" says, and consult perhaps some Sun JDK mailing
lists on those specific errors from the Java dump file.

Ioan

Samatha Kottha wrote:
Hi,

The globus container on our server is crashing quite a lot in the last
two days. Hundredes of jobs are submitted and executed sussessfully and
then suddenly gives up. There are no messages in the container logs
except this one. We have an identical machine with the same run time
environment and same globus version (4.0.5), there the container has no
problem to run for weeks without interruption.

Any clues?

Cheers,
Samatha

2007-11-02 07:12:12,944 INFO  exec.StateMachine
[RunQueueThread_14,logJobSucceeded:3535] Job
eb9c1250-8908-11dc-ad43-fd915b70f6e7 finished successfully
#
# An unexpected error has been detected by Java Runtime Environment:
#
#  Internal Error (4E4D4554484F440E435050071F), pid=18976,
tid=1148651840
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (1.6.0-b105 mixed mode)
# An error report file with more information is saved as
hs_err_pid18976.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp




--
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: [EMAIL PROTECTED]
Web:   http://www.cs.uchicago.edu/~iraicu
      http://dsl.cs.uchicago.edu/
============================================
============================================

Reply via email to