Re: HAST instability

Daniel Kalchev Tue, 31 May 2011 05:51:47 -0700


On 30.05.11 21:42, Mikolaj Golub wrote:

  DK>  One strange thing is that there is never established TCP connection
  DK>  between both nodes:

  DK>  tcp4       0      0 10.2.101.11.48939      10.2.101.12.8457       
FIN_WAIT_2
  DK>  tcp4       0   1288 10.2.101.11.57008      10.2.101.12.8457       
CLOSE_WAIT
  DK>  tcp4       0      0 10.2.101.11.46346      10.2.101.12.8457       
FIN_WAIT_2
  DK>  tcp4       0  90648 10.2.101.11.13916      10.2.101.12.8457       
CLOSE_WAIT
  DK>  tcp4       0      0 10.2.101.11.8457       *.*                    LISTEN

It is normal. hastd uses the connections only in one direction so it calls
shutdown to close unused directions.

So the TCP connections are all too short-lived that I can never see asingle one in ESTABLISHED state? 10Gbit Ethernet is indeed fast, so thismight well be possible...

I suppose when checksum is enabled the bottleneck is cpu, the triffic rate is 
lower and the problem is not triggered.

I was thinking something like this. My later tests seems to suggest thatwhen the network transfer rate is mugh higher than disk transfer ratethis gets triggered.

"Hash mismatch" message suggests that actually you were using checksum then,
weren't you?

Yes, this occurs only when checksums are enabled. Happens with bothcrc32 and sha256.

I would like to look at full logs for some rather large period, with several
cases, from both primary and secondary (and be sure about synchronized time).

I have made sure clocks are synchronized and am currently running on afreshly rebooted nodes (with two additional SATA drives at each node) --so far some interesting findings, like I get hash errors anddisconnects much more frequent now. Will post when an bonnie++ run onthe ZFS filesystem on top of the HAST resources finishes.

Also, it might worth checking that there is no network packet corruption (some 
strange things in netstat -di, netstat -s, may be copying large files via net 
and comparing checksums).

I will post these as well, however so far no indication of any networkproblems was seen, no interface errors etc. Might be also the ix driveris not reporting such, of course.

One additional note: while playing with this setup, I tried to simulatelocal disk going away in the hope HAST will switch to using the remotedisk. Instead of asking someone at the site to pull out the drive, Ijust issued on the primary


hastctl role init data0

which resulted in kernel panic. Unfortunately, there was no sufficientdump space for 48GB. I will re-run this again with more drives for thecrash dump. Anything you want me to look for in particular? (kernelshave no KDB compiled in yet)


Daniel
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"

Re: HAST instability

Reply via email to