I've pulled the latest code from the cvs head, and rebuilt it,
unfortunately the MD server still crashes hard, and now it seems that
the servers are eating 100% cpu (though I didnt check this prior to
updating, I suppose that could have been useful)
Here's a log of the IB version.. I'm almost positive this is an IB
specific problem at this point, as nobody else is having these problems
that I know of.
---- client ----
p5l6:~# pvfs2-cp -t /tmp/junkfile /pvfs2/6node/
Wrote 2147483648 bytes in 2.695799 seconds. 759.700592 MB/seconds
p5l6:~# pvfs2-cp -t /pvfs2/6node/junkfile /dev/null
< ctrl-c after a minute >
< all the data servers are now off in runaway land, metadata server is
dead now, as verified below >
p5l6:~# pvfs2-ls
[E 10:18:53.867913] Warning: ib_tcp_client_connect: connect to server
da6:3336: Connection refused.
[E 10:18:53.868039] Receive immediately failed: Connection refused
[E 10:18:53.868098] msgpair failed, will retry: Connection refused
[E 10:18:55.870880] Warning: ib_tcp_client_connect: connect to server
da6:3336: Connection refused.
[E 10:18:55.870926] Receive immediately failed: Connection refused
[E 10:18:55.870995] msgpair failed, will retry: Connection refused
[E 10:18:57.873345] Warning: ib_tcp_client_connect: connect to server
da6:3336: Connection refused.
[E 10:18:57.873383] Receive immediately failed: Connection refused
[E 10:18:57.873444] msgpair failed, will retry: Connection refused
---- MD server log, times are not at all sync'd with above ----
D 10:15:17.161630] PVFS2 Server version 1.5.1pre1-2006-09-07-182738
starting.
[E 10:23:20.739431] Job time out: cancelling flow operation, job_id: 4370.
[E 10:23:20.739511] Flow proto cancel called on 0x63f640
[E 10:23:20.739526] Flow proto error cleanup started on 0x63f640,
error_code: -1
610612737
[E 10:23:20.739628] Flow proto 0x63f640 canceling a total of 7 BMI or
Trove oper
ations
-----
and with my current level (lack) of debugging, none of the data servers
show anything, but are running away at 100% cpu, their logs show nothing
other than the startup line.
-----
Pete, which level of debugging would be best to get a good log? trove
or network?
Thanks,
-- Kyle
--
Kyle Schochenmaier
[EMAIL PROTECTED]
Research Assistant, Dr. Brett Bode
AmesLab - US Dept.Energy
Scalable Computing Laboratory
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers