I've pulled the latest code from the cvs head, and rebuilt it,
unfortunately the MD server still crashes hard, and now it seems that the servers are eating 100% cpu (though I didnt check this prior to updating, I suppose that could have been useful)

Here's a log of the IB version.. I'm almost positive this is an IB specific problem at this point, as nobody else is having these problems that I know of.

---- client ----
p5l6:~# pvfs2-cp -t /tmp/junkfile /pvfs2/6node/
Wrote 2147483648 bytes in 2.695799 seconds. 759.700592 MB/seconds
p5l6:~# pvfs2-cp -t /pvfs2/6node/junkfile /dev/null


< ctrl-c after a minute >
< all the data servers are now off in runaway land, metadata server is dead now, as verified below >

p5l6:~# pvfs2-ls
[E 10:18:53.867913] Warning: ib_tcp_client_connect: connect to server da6:3336: Connection refused.
[E 10:18:53.868039] Receive immediately failed: Connection refused
[E 10:18:53.868098] msgpair failed, will retry: Connection refused
[E 10:18:55.870880] Warning: ib_tcp_client_connect: connect to server da6:3336: Connection refused.
[E 10:18:55.870926] Receive immediately failed: Connection refused
[E 10:18:55.870995] msgpair failed, will retry: Connection refused
[E 10:18:57.873345] Warning: ib_tcp_client_connect: connect to server da6:3336: Connection refused.
[E 10:18:57.873383] Receive immediately failed: Connection refused
[E 10:18:57.873444] msgpair failed, will retry: Connection refused


---- MD server log, times are not at all sync'd with above ----

D 10:15:17.161630] PVFS2 Server version 1.5.1pre1-2006-09-07-182738 starting.
[E 10:23:20.739431] Job time out: cancelling flow operation, job_id: 4370.
[E 10:23:20.739511] Flow proto cancel called on 0x63f640
[E 10:23:20.739526] Flow proto error cleanup started on 0x63f640, error_code: -1
610612737
[E 10:23:20.739628] Flow proto 0x63f640 canceling a total of 7 BMI or Trove oper
ations

-----
and with my current level (lack) of debugging, none of the data servers show anything, but are running away at 100% cpu, their logs show nothing other than the startup line.
-----

Pete, which level of debugging would be best to get a good log? trove or network?


Thanks,
   -- Kyle



--
Kyle Schochenmaier
[EMAIL PROTECTED]
Research Assistant, Dr. Brett Bode
AmesLab - US Dept.Energy
Scalable Computing Laboratory
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to