Oh I see. So the socket SO_SNDBUF, SO_RCVBUF increase was for the datagram socket inside the method Comm::create_datagram_receive_socket()? If that's the case, then I suspect that Hyperspace is getting bombarded with keepalive packets and dropping them because it's socket buffer has filled up. Another thing for you to try is to set the random startup delay factor to something higher than 5 seconds in the Capfile. Try something like this:
[...] `let l=$RANDOM%20 r=$RANDOM%1000; sleep $l.$r` && [...] This will cause the servers to take 20 seconds to come up, but it will stagger the start times so that the keepalives sent to Hyperspace will be spaced further apart. It would be interesting to see if the problem does not exist with this modification and SO_SNDBUF, SO_RCVBUF set to 20*32768. - Doug On Thu, Aug 28, 2008 at 11:05 PM, Hiroyuki Uchiyama < [EMAIL PROTECTED]> wrote: > > Hi Doug, > > Okay, I will try. > > I think, one reason of the Hyperspace session expirations is UDP > socket buffer overflow. > Because the benchmark inserted key-value pairs via one client process, > it's hard to imagine that network is congested. > > I think there is a hint in this > page(http://www.29west.com/docs/THPM/udp-buffer-sizing.html). > > Thanks. > > 2008/8/29 Doug Judd <[EMAIL PROTECTED]>: > > Hi Hiroyuki, > > > > This is very valuable information. The only thing I can think of here is > > that by increasing the socket buffers to ~ 1 MB it reduces network > traffic > > since there would be fewer TCP ACK packets sent. Here's an experiment > that > > you could try that might shed some light on the situation: > > > > On one of the machines that is running a RangeServer, run tcpdump twice. > > Once with a stock 0.9.0.10 Hypertable software and again with the > changes > > you made to increase the socket buffer. Here's an example of how I run > > tcpdump on my machine: > > > > $ cd /tmp > > $ sudo /usr/sbin/tcpdump -i eth0 -w tcpdump-stock.out > > > > Capture approximately 30 seconds worth of tcpdump output in files called > > "tcpdump-stock.out" and "tcpdump-bigbuf.out". Once you have these files, > > you can get a nice summary of the TCP traffic on a per-connection basis, > > with a tool called tcptrace (see http://www.tcptrace.org/). You'll need > to > > install libpcap to build tcptrace. Post the output of tcptrace mailing > list > > (or upload the files here). That would help shed light on what the > network > > traffic looks like under both situations. Here's an example of what the > > tcptrace output looks like: > > > > $ tcptrace -l tcpdump.out > > 1 arg remaining, starting with 'tcpdump.out' > > Ostermann's tcptrace -- version 6.6.7 -- Thu Nov 4, 2004 > > > > 112 packets seen, 108 TCP packets traced > > elapsed wallclock time: 0:00:00.048468, 2310 pkts/sec analyzed > > trace file elapsed time: 0:00:02.758912 > > TCP connection info: > > 20 TCP connections traced: > > TCP connection 1: > > host a: motherlode001.admin.zvents.com:46806 > > host b: admin1.admin.zvents.com:111 > > complete conn: yes > > first packet: Thu Aug 28 10:55:56.155277 2008 > > last packet: Thu Aug 28 10:55:56.155322 2008 > > elapsed time: 0:00:00.000045 > > total packets: 10 > > filename: tcpdump.out > > a->b: b->a: > > total packets: 6 total packets: > > 4 > > ack pkts sent: 5 ack pkts sent: > > 4 > > pure acks sent: 3 pure acks sent: > > 1 > > sack pkts sent: 0 sack pkts sent: > > 0 > > dsack pkts sent: 0 dsack pkts sent: > > 0 > > max sack blks/ack: 0 max sack blks/ack: > > 0 > > unique bytes sent: 60 unique bytes sent: > > 32 > > actual data pkts: 1 actual data pkts: > > 1 > > actual data bytes: 60 actual data bytes: > > 32 > > rexmt data pkts: 0 rexmt data pkts: > > 0 > > rexmt data bytes: 0 rexmt data bytes: > > 0 > > zwnd probe pkts: 0 zwnd probe pkts: > > 0 > > zwnd probe bytes: 0 zwnd probe bytes: > > 0 > > outoforder pkts: 0 outoforder pkts: > > 0 > > pushed data pkts: 1 pushed data pkts: > > 1 > > SYN/FIN pkts sent: 1/1 SYN/FIN pkts sent: > > 1/1 > > req 1323 ws/ts: Y/Y req 1323 ws/ts: > > Y/Y > > adv wind scale: 7 adv wind scale: > > 2 > > req sack: Y req sack: > > Y > > sacks sent: 0 sacks sent: > > 0 > > urgent data pkts: 0 pkts urgent data pkts: 0 > > pkts > > urgent data bytes: 0 bytes urgent data bytes: 0 > > bytes > > mss requested: 1460 bytes mss requested: 1460 > > bytes > > max segm size: 60 bytes max segm size: 32 > > bytes > > min segm size: 60 bytes min segm size: 32 > > bytes > > avg segm size: 59 bytes avg segm size: 31 > > bytes > > max win adv: 5888 bytes max win adv: 5792 > > bytes > > min win adv: 5888 bytes min win adv: 5792 > > bytes > > zero win adv: 0 times zero win adv: 0 > > times > > avg win adv: 5888 bytes avg win adv: 5792 > > bytes > > initial window: 60 bytes initial window: 32 > > bytes > > initial window: 1 pkts initial window: 1 > > pkts > > ttl stream length: 60 bytes ttl stream length: 32 > > bytes > > missed data: 0 bytes missed data: 0 > > bytes > > truncated data: 30 bytes truncated data: 2 > > bytes > > truncated packets: 1 pkts truncated packets: 1 > > pkts > > data xmit time: 0.000 secs data xmit time: 0.000 > > secs > > idletime max: 0.0 ms idletime max: 0.0 > > ms > > throughput: 1333333 Bps throughput: 711111 > > Bps > > ================================ > > TCP connection 2: > > host c: motherlode001.admin.zvents.com:651 > > host d: admin1.admin.zvents.com:850 > > complete conn: yes > > first packet: Thu Aug 28 10:55:56.155315 2008 > > last packet: Thu Aug 28 10:55:56.155746 2008 > > elapsed time: 0:00:00.000431 > > total packets: 12 > > filename: tcpdump.out > > c->d: d->c: > > total packets: 7 total packets: > > 5 > > ack pkts sent: 6 ack pkts sent: > > 5 > > pure acks sent: 4 pure acks sent: > > 1 > > sack pkts sent: 0 sack pkts sent: > > 0 > > dsack pkts sent: 0 dsack pkts sent: > > 0 > > max sack blks/ack: 0 max sack blks/ack: > > 0 > > unique bytes sent: 72 unique bytes sent: > > 1608 > > actual data pkts: 1 actual data pkts: > > 2 > > actual data bytes: 72 actual data bytes: > > 1608 > > rexmt data pkts: 0 rexmt data pkts: > > 0 > > rexmt data bytes: 0 rexmt data bytes: > > 0 > > zwnd probe pkts: 0 zwnd probe pkts: > > 0 > > zwnd probe bytes: 0 zwnd probe bytes: > > 0 > > outoforder pkts: 0 outoforder pkts: > > 0 > > pushed data pkts: 1 pushed data pkts: > > 1 > > SYN/FIN pkts sent: 1/1 SYN/FIN pkts sent: > > 1/1 > > req 1323 ws/ts: Y/Y req 1323 ws/ts: > > Y/Y > > adv wind scale: 7 adv wind scale: > > 2 > > req sack: Y req sack: > > Y > > sacks sent: 0 sacks sent: > > 0 > > urgent data pkts: 0 pkts urgent data pkts: 0 > > pkts > > urgent data bytes: 0 bytes urgent data bytes: 0 > > bytes > > mss requested: 1460 bytes mss requested: 1460 > > bytes > > max segm size: 72 bytes max segm size: 1448 > > bytes > > min segm size: 72 bytes min segm size: 160 > > bytes > > avg segm size: 71 bytes avg segm size: 803 > > bytes > > max win adv: 11648 bytes max win adv: 5792 > > bytes > > min win adv: 5888 bytes min win adv: 5792 > > bytes > > zero win adv: 0 times zero win adv: 0 > > times > > avg win adv: 9258 bytes avg win adv: 5792 > > bytes > > initial window: 72 bytes initial window: 1448 > > bytes > > initial window: 1 pkts initial window: 1 > > pkts > > ttl stream length: 72 bytes ttl stream length: 1608 > > bytes > > missed data: 0 bytes missed data: 0 > > bytes > > truncated data: 42 bytes truncated data: 1548 > > bytes > > truncated packets: 1 pkts truncated packets: 2 > > pkts > > data xmit time: 0.000 secs data xmit time: 0.000 > > secs > > idletime max: 0.4 ms idletime max: 0.4 > > ms > > throughput: 167053 Bps throughput: 3730858 > > Bps > > ================================ > > TCP connection 3: > > > > [...] > > > > - Doug > > > > On Thu, Aug 28, 2008 at 3:15 AM, Hiroyuki Uchiyama > > <[EMAIL PROTECTED]> wrote: > >> > >> Hi Doug: > >> > >> I report the benchmark results for perf_eval3.cc on 99 nodes cluster. > >> In this time, 1000000 records random write/read benchmarks succeeded. > >> > >> Setup: > >> -HDFS 0.18.0(# replication is 3) + Hypertable 0.9.0.10 > >> -99node > >> -Network topology is same as before > >> -Process Assignments for nodes: > >> One node : Hyperspace.Master, Hypertable.Master, DfsBroker. > >> The other nodes : Hypertable.RangeServer, DfsBroker. > >> -The table got created with COMPRESSOR="none". > >> -The parameter net.core.rmem_max was set to 1048576 on each node. > >> -SO_SNDBUF, SO_RCVBUF was set to 40*32768 > >> (When I set these variables to 20, then session expiration was > >> occurred in RangeServer. Not found errors were occurred too). > >> -"Hyperspace.Lease.Interval=180" and > >> "Hyperspace.KeepAlive.Interval=30" were written in hypertable.cfg > >> > >> Results: > >> [EMAIL PROTECTED]:/opt/hypertable/0.9.0.9/bin/bench > >> (1150)->$./perf_eval write > >> Evaluating random writes performance > >> 10000 was written. elapsed time was 55.8485[s] > >> 20000 was written. elapsed time was 79.7066[s] > >> 30000 was written. elapsed time was 114.838[s] > >> 40000 was written. elapsed time was 144.024[s] > >> 50000 was written. elapsed time was 169.273[s] > >> 60000 was written. elapsed time was 191.562[s] > >> 70000 was written. elapsed time was 208.223[s] > >> 80000 was written. elapsed time was 225.255[s] > >> 90000 was written. elapsed time was 252.684[s] > >> 100000 was written. elapsed time was 285.045[s] > >> 110000 was written. elapsed time was 327.634[s] > >> 120000 was written. elapsed time was 355.299[s] > >> 130000 was written. elapsed time was 381.215[s] > >> 140000 was written. elapsed time was 404.425[s] > >> 150000 was written. elapsed time was 434.377[s] > >> 160000 was written. elapsed time was 465.228[s] > >> 170000 was written. elapsed time was 494.867[s] > >> 180000 was written. elapsed time was 525.488[s] > >> 190000 was written. elapsed time was 562.381[s] > >> 200000 was written. elapsed time was 585.378[s] > >> 210000 was written. elapsed time was 627.808[s] > >> 220000 was written. elapsed time was 654.666[s] > >> 230000 was written. elapsed time was 685.571[s] > >> 240000 was written. elapsed time was 717.625[s] > >> 250000 was written. elapsed time was 742.403[s] > >> 260000 was written. elapsed time was 760.711[s] > >> 270000 was written. elapsed time was 776.017[s] > >> 280000 was written. elapsed time was 791.929[s] > >> 290000 was written. elapsed time was 824.23[s] > >> 300000 was written. elapsed time was 843.364[s] > >> 310000 was written. elapsed time was 872.655[s] > >> 320000 was written. elapsed time was 901[s] > >> 330000 was written. elapsed time was 933.649[s] > >> 340000 was written. elapsed time was 971.193[s] > >> 350000 was written. elapsed time was 994.717[s] > >> 360000 was written. elapsed time was 1019.72[s] > >> 370000 was written. elapsed time was 1042.03[s] > >> 380000 was written. elapsed time was 1084.11[s] > >> 390000 was written. elapsed time was 1114.48[s] > >> 400000 was written. elapsed time was 1141.3[s] > >> 410000 was written. elapsed time was 1172.57[s] > >> 420000 was written. elapsed time was 1199.76[s] > >> 430000 was written. elapsed time was 1236.37[s] > >> 440000 was written. elapsed time was 1257.26[s] > >> 450000 was written. elapsed time was 1288.16[s] > >> 460000 was written. elapsed time was 1315.06[s] > >> 470000 was written. elapsed time was 1341.77[s] > >> 480000 was written. elapsed time was 1367.41[s] > >> 490000 was written. elapsed time was 1393.11[s] > >> 500000 was written. elapsed time was 1421.17[s] > >> 510000 was written. elapsed time was 1441.17[s] > >> 520000 was written. elapsed time was 1459.94[s] > >> 530000 was written. elapsed time was 1483.48[s] > >> 540000 was written. elapsed time was 1508.05[s] > >> 550000 was written. elapsed time was 1526.1[s] > >> 560000 was written. elapsed time was 1550.3[s] > >> 570000 was written. elapsed time was 1576.33[s] > >> 580000 was written. elapsed time was 1611.22[s] > >> 590000 was written. elapsed time was 1639.65[s] > >> 600000 was written. elapsed time was 1673.38[s] > >> 610000 was written. elapsed time was 1703.75[s] > >> 620000 was written. elapsed time was 1730.07[s] > >> 630000 was written. elapsed time was 1760.3[s] > >> 640000 was written. elapsed time was 1790.03[s] > >> 650000 was written. elapsed time was 1808.18[s] > >> 660000 was written. elapsed time was 1850.84[s] > >> 670000 was written. elapsed time was 1877.71[s] > >> 680000 was written. elapsed time was 1909.14[s] > >> 690000 was written. elapsed time was 1933.12[s] > >> 700000 was written. elapsed time was 1965.78[s] > >> 710000 was written. elapsed time was 1994.93[s] > >> 720000 was written. elapsed time was 2016.93[s] > >> 730000 was written. elapsed time was 2049.4[s] > >> 740000 was written. elapsed time was 2081.43[s] > >> 750000 was written. elapsed time was 2109.54[s] > >> 760000 was written. elapsed time was 2138.25[s] > >> 770000 was written. elapsed time was 2166.13[s] > >> 780000 was written. elapsed time was 2196.65[s] > >> 790000 was written. elapsed time was 2225.33[s] > >> 800000 was written. elapsed time was 2253.39[s] > >> 810000 was written. elapsed time was 2278.8[s] > >> 820000 was written. elapsed time was 2301.08[s] > >> 830000 was written. elapsed time was 2339.2[s] > >> 840000 was written. elapsed time was 2377.15[s] > >> 850000 was written. elapsed time was 2406.77[s] > >> 860000 was written. elapsed time was 2433.43[s] > >> 870000 was written. elapsed time was 2467.82[s] > >> 880000 was written. elapsed time was 2499.9[s] > >> 890000 was written. elapsed time was 2527.96[s] > >> 900000 was written. elapsed time was 2547.74[s] > >> 910000 was written. elapsed time was 2586.66[s] > >> 920000 was written. elapsed time was 2611.76[s] > >> 930000 was written. elapsed time was 2643.3[s] > >> 940000 was written. elapsed time was 2669[s] > >> 950000 was written. elapsed time was 2693.85[s] > >> 960000 was written. elapsed time was 2716.45[s] > >> 970000 was written. elapsed time was 2742.04[s] > >> 980000 was written. elapsed time was 2774.01[s] > >> 990000 was written. elapsed time was 2798.63[s] > >> Random writes: 1000000 99576-byte rows in 2822.951 seconds, 354.2 rows > >> per second > >> non_exist=0 > >> > >> [EMAIL PROTECTED]:/opt/hypertable/0.9.0.9/bin/bench > >> (1151)->$./perf_eval read > >> Evaluating random reads performance > >> 10000 was read. elapsed time was 501.206[s] > >> 20000 was read. elapsed time was 944.984[s] > >> 30000 was read. elapsed time was 1363.09[s] > >> 40000 was read. elapsed time was 1765.4[s] > >> 50000 was read. elapsed time was 2146.59[s] > >> 60000 was read. elapsed time was 2516.65[s] > >> 70000 was read. elapsed time was 2880.07[s] > >> 80000 was read. elapsed time was 3231.15[s] > >> 90000 was read. elapsed time was 3579.07[s] > >> 100000 was read. elapsed time was 3908.22[s] > >> 110000 was read. elapsed time was 4238.06[s] > >> 120000 was read. elapsed time was 4563.35[s] > >> 130000 was read. elapsed time was 4883.92[s] > >> 140000 was read. elapsed time was 5197.05[s] > >> 150000 was read. elapsed time was 5508.85[s] > >> 160000 was read. elapsed time was 5814.83[s] > >> 170000 was read. elapsed time was 6121.85[s] > >> 180000 was read. elapsed time was 6422.65[s] > >> 190000 was read. elapsed time was 6723.65[s] > >> 200000 was read. elapsed time was 7026.65[s] > >> 210000 was read. elapsed time was 7323.4[s] > >> 220000 was read. elapsed time was 7619.73[s] > >> 230000 was read. elapsed time was 7913.79[s] > >> 240000 was read. elapsed time was 8207.23[s] > >> 250000 was read. elapsed time was 8498.48[s] > >> 260000 was read. elapsed time was 8787.09[s] > >> 270000 was read. elapsed time was 9079.14[s] > >> 280000 was read. elapsed time was 9366.56[s] > >> 290000 was read. elapsed time was 9656.62[s] > >> 300000 was read. elapsed time was 9939.66[s] > >> 310000 was read. elapsed time was 10227.5[s] > >> 320000 was read. elapsed time was 10512.4[s] > >> 330000 was read. elapsed time was 10803.3[s] > >> 340000 was read. elapsed time was 11081.3[s] > >> 350000 was read. elapsed time was 11365.2[s] > >> 360000 was read. elapsed time was 11647.8[s] > >> 370000 was read. elapsed time was 11929.8[s] > >> 380000 was read. elapsed time was 12207.4[s] > >> 390000 was read. elapsed time was 12487.9[s] > >> 400000 was read. elapsed time was 12765.6[s] > >> 410000 was read. elapsed time was 13042.2[s] > >> 420000 was read. elapsed time was 13316.9[s] > >> 430000 was read. elapsed time was 13590.1[s] > >> 440000 was read. elapsed time was 13864.5[s] > >> 450000 was read. elapsed time was 14138.4[s] > >> 460000 was read. elapsed time was 14412.2[s] > >> 470000 was read. elapsed time was 14682.4[s] > >> 480000 was read. elapsed time was 14955.8[s] > >> 490000 was read. elapsed time was 15226.1[s] > >> 500000 was read. elapsed time was 15498.6[s] > >> 510000 was read. elapsed time was 15769.7[s] > >> 520000 was read. elapsed time was 16040.8[s] > >> 530000 was read. elapsed time was 16312[s] > >> 540000 was read. elapsed time was 16577.5[s] > >> 550000 was read. elapsed time was 16846[s] > >> 560000 was read. elapsed time was 17113.5[s] > >> 570000 was read. elapsed time was 17382.4[s] > >> 580000 was read. elapsed time was 17648.3[s] > >> 590000 was read. elapsed time was 17913.6[s] > >> 600000 was read. elapsed time was 18179.6[s] > >> 610000 was read. elapsed time was 18445.3[s] > >> 620000 was read. elapsed time was 18710.6[s] > >> 630000 was read. elapsed time was 18972.6[s] > >> 640000 was read. elapsed time was 19236.1[s] > >> 650000 was read. elapsed time was 19497.5[s] > >> 660000 was read. elapsed time was 19759[s] > >> 670000 was read. elapsed time was 20018.8[s] > >> 680000 was read. elapsed time was 20278.8[s] > >> 690000 was read. elapsed time was 20537.3[s] > >> 700000 was read. elapsed time was 20794.7[s] > >> 710000 was read. elapsed time was 21051.7[s] > >> 720000 was read. elapsed time was 21309.2[s] > >> 730000 was read. elapsed time was 21567.2[s] > >> 740000 was read. elapsed time was 21821.2[s] > >> 750000 was read. elapsed time was 22077.6[s] > >> 760000 was read. elapsed time was 22330.9[s] > >> 770000 was read. elapsed time was 22582.6[s] > >> 780000 was read. elapsed time was 22832.7[s] > >> 790000 was read. elapsed time was 23080.8[s] > >> 800000 was read. elapsed time was 23328.4[s] > >> 810000 was read. elapsed time was 23573.7[s] > >> 820000 was read. elapsed time was 23819[s] > >> 830000 was read. elapsed time was 24061[s] > >> 840000 was read. elapsed time was 24301.8[s] > >> 850000 was read. elapsed time was 24543.1[s] > >> 860000 was read. elapsed time was 24781[s] > >> 870000 was read. elapsed time was 25019.4[s] > >> 880000 was read. elapsed time was 25252.3[s] > >> 890000 was read. elapsed time was 25486.1[s] > >> 900000 was read. elapsed time was 25721.1[s] > >> 910000 was read. elapsed time was 25956.6[s] > >> 920000 was read. elapsed time was 26188[s] > >> 930000 was read. elapsed time was 26418[s] > >> 940000 was read. elapsed time was 26647.8[s] > >> 950000 was read. elapsed time was 26872.2[s] > >> 960000 was read. elapsed time was 27096.7[s] > >> 970000 was read. elapsed time was 27321.3[s] > >> 980000 was read. elapsed time was 27546.2[s] > >> 990000 was read. elapsed time was 27770[s] > >> Random reads: 1000000 99576-byte rows in 27987.964 seconds, 35.7 rows > per > >> second > >> non_exist=0 > >> > >> > > > > > > > > > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en -~----------~----~----~----~------~----~------~--~---
