Re: Child panics on OpenSolaris
Hi Poul-Henning, > Yeah, we clearly need to improve the autocrap magic a bit to get stuff > like this right. Yes. I'm sorry for not having documented this a year ago. Would have saved a lot of pain and effort... > I have added a runtime test now, which panics the child process if the > errno variable is not working properly. Great! >> VCC_CC="exec gcc -fpic -D_REENTRANT -m64 -shared -o %o %s" > > Right now our VCC_CC default is for the sun-compiler I think ("-Kpic"), Yes. The reason I prefer gcc is simply that SunCC is not available everywhere and if I understand the licence correctly, it is still only free to use for developers and development purposes. > That's another piece of autocrap stuff that needs fixed... Do you want me to do it? Nils ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Child panics on OpenSolaris
Hi Poul-Henning and all, > We need CFLAGS to contain -mt on solaris, otherwise errno does not > get macro-fied to be per-thread. > > That's where the: EBADF comes from, some entirely different > filedescriptor a long time ago, in the master process... Fantastic. Thank you for putting so much effort into this. But also: Stupid I didn't think about this. See how I'm compiling varnish since I started working with it: ## 64bit LDFLAGS="-lpthread" CFLAGS="-D_REENTRANT -m64" \ VCC_CC="exec gcc -fpic -D_REENTRANT -m64 -shared -o %o %s" \ ./configure \ '--enable-debugging-symbols' \ '--enable-developer-warnings' \ '--enable-dependency-tracking' \ IIUC, this effectively has the same effect as -mt for SunCC: User Commands cc(1) NAME cc - C compiler [...] -mt Use this option to compile and link multithreaded code. The -mt option assures that libraries are linked in the appropriate order. This option passes -D_REENTRANT to the preprocessor and passes -lthread in the correct order to ld. If you are using POSIX threads, you must link with the options -mt -lpthread. Nils ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Child panics on OpenSolaris
Hi Poul-Henning, > That was one of my theories, but it does not fit the facs of the case, I don't understand what contradicts this hypothesis? > and it would violate POSIX, which I doubt Solaris would do... Just out of interest: Do you have a reference why that would violate POSIX? http://www.opengroup.org/onlinepubs/009695399/functions/poll.html doesn't tell. > The best contender is still that varnish closes the fd by mistake, > but I'll be damned if I can find where... Please let's come back to the basics and your initial hypothesis one last time: Suppose the RST gets received between the poll() and ioctl() in TCP_nonblocking. Why should the ioctl not be allowed to return EBADF in that case? For instance, this man page Ioctl Requests streamio(7I) NAME streamio - STREAMS ioctl commands SYNOPSIS #include #include #include int ioctl(int fildes, int command, ... /*arg*/); only mentions a hand full of error codes and then there is this: http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/inet/proto_set.c#tli_errs I'm not sure this is the exact mapping for this case, but it looks like it... Nils ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Child panics on OpenSolaris
Poul-Henning, > Child (3504) died signal=6 > Child (3504) Panic message: Assert error in TCP_nonblocking(), tcp.c line 172: > Condition((ioctl(sock, ((int)((uint32_t)(0x8000|(((sizeof > (int))&0xff)<<16)| ('f'<<8)|126))), &i)) == 0) not true. > errno = 9 (Bad file number) > thread = (cache-worker) > ident = -smalloc,-hcritbit,poll > Backtrace: > 4457db: /opt/sbin/varnishd'pan_backtrace+0x1b [0x4457db] > 445ae5: /opt/sbin/varnishd'pan_ic+0x1c5 [0x445ae5] > fd7ff2efdfec: /opt/lib/libvarnish.so.1.0.0'TCP_nonblocking+0x7c > [0xfd7ff2efdfec] > 419091: /opt/sbin/varnishd'vca_return_session+0x1b1 [0x419091] > 426aad: /opt/sbin/varnishd'cnt_wait+0x2bd [0x426aad] > 42bc3a: /opt/sbin/varnishd'CNT_Session+0x4ba [0x42bc3a] > 44835b: /opt/sbin/varnishd'wrk_do_cnt_sess+0x19b [0x44835b] > 447954: /opt/sbin/varnishd'wrk_thread_real+0x854 [0x447954] > 447eb3: /opt/sbin/varnishd'wrk_thread+0x123 [0x447eb3] Just an idea from checking differences between the code I use and trunk: In cnt_wait, shouldn't we check pfd[0].revents for POLLERR and POLLHUP? Could it be that Solaris assumes that delivery an error once should suffice, so further use of the fd will return EBADF? Again, I haven't investigated further, sorry for the noise if this turns out to be stupid. Nils ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Child panics on OpenSolaris
Hi Poul-Henning, Victor added a comment to http://varnish-cache.org/ticket/629: http://pastie.org/791964 child (28980) Started Child (28980) said Closed fds: 3 5 6 9 10 12 13 Child (28980) said Child starts Child (28980) said managed to mmap 536870912 bytes of 536870912 Child (28980) died signal=6 Child (28980) Panic message: Assert error in vca_main(), cache_waiter_ports.c line 175: Condition((errno == EINTR) || (errno == ETIME)) not true. errno = 9 (Bad file number) Backtrace: 42e405: /opt/extra/sbin/varnishd'pan_ic+0x95 [0x42e405] fd7ffefa4ee0: /lib/amd64/libc.so.1'_lwp_start+0x0 [0xfd7ffefa4ee0] I definitely do not see this in my patched up 2.0.3 (varnish running for weeks on 7 servers now), so IMHO, this is another hint into the direction of some change in trunk. Nils ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Child panics on OpenSolaris
Hi Paul, > You're right, I'd forgotten to mention it before now. > > [r...@varnish bin]# uname -a > SunOS varnish 5.11 snv_111b i86pc i386 i86pc Good, thank you. Then at least this shouldn't be a bug introduced with the major changes of the ip datapath refactoring project http://hub.opensolaris.org/bin/view/Project+ip-refactor/WebHome Nils ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Child panics on OpenSolaris
Hi Poul-Henning and all, > If I had implemented the hack I suspect Solaris contains, I would > have found some bit somewhere, to make sure the errno would be the > correct, documented and expected: > > #define ECONNRESET 54 /* Connection reset by peer */ > > Somebody with a Solaris service contract, if such things still > exist, should report this as a bug to them... > > I will add a workaround to Varnish, with a suitable sarcastic > commentary... I can't tell at this point if this is a Solaris or a Varnish issue, but I can tell for sure that I have never seen it on 2.0.3 with a couple of additional fixes running a *high* traffic site. The main difference between this version and trunk with respect to TCP is, IIUC, the lack of tcp_linger. Maybe that gives a clue on where to look, but on the other hand this might be absolutely the wrong war to go. Regarding Solaris, this piece of code here http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/inet/tcp/tcp.c#11049 documents the errnos that SHOULD be set upon reception of an RST depending on various conditions. Regarding sarcasm: Please add the comment when the current hypothesis proves true, but my experience is that in many cases the Solaris guys are quite fuzzy about documentation. I volunteer to get this fixed in OpenSolaris should it turn out that the issue is root caused there. Nils ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Child panics on OpenSolaris
Paul, have I missed anything or have you not yet stated which version of OpenSolaris (uname -v) you're using? If you don't use a standard build, could you please give the hg changeset you're on? Nils ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Time to replace the hit ratio with something more intuitive?
Hi, in http://varnish.projects.linpro.no/ticket/613 I have suggested to add a measure to varnishstat which I thought could be called the "efficiency ratio". Tollef has commented that we'd need the community's (YOUR) opinion on this: The varnishstat cache hit rate basically gives a ratio for how many requests being directed to the cache component of varnish have been answered from it. It does not say anything about the number of requests being passed onto the backend for whatever reason. So it is possible to see cache hit rates of 0. (99.99%) but still 99% of the client requests hit your backend, if only 1% of the requests qualify for being served from the cache. I am suggesting to amend (or replace ?) this figure by a ratio of client requests being handled by the cache by total number of requests. In other words, a measure for how many of the client requests do not result in a backend request. My experience is that this figure is far more important, because cache users will mostly be interested in saving backend requests. The cache hit rate is probably of secondary importance, and it can be confusing to get a high cache hit rate while still (too) many requests are hitting the backend. Here's how the two figures look like on a production system: Hitrate ratio: 10 100 1000 Hitrate avg:0.9721 0.9721 0.9731 Efficiency ratio: 10 100 1000 Efficiency avg:0.9505 0.9522 0.9533 55697963 200.97 256.93 Client connections accepted 402992210 1518.81 1858.98 Client requests received 390022582 1471.82 1799.15 Cache hits 1549 0.00 0.01 Cache hits for pass 905363722.0041.76 Cache misses Now it's up to you, what do you think about this? Nils ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Combining req.url matches
Hi "Pub Crawler", > For instance: > if (req.url ~ "photoupload.cfm") {pass;} > if (req.url ~ "logoupload.cfm") {pass;} > > Is there a prescribed way to combine that into one line? Firstly, you should note that the argument to the ~ operator is a regexp, so if you mean a literal dot, it's \. . You can also group subexpressions like this: if (req.url ~ "(photo|logo)upload\.cfm") {pass;} Nils ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Dropped connections with tcp_tw_recycle=1
Sven, > Right, you're saying that the srcaddr+srcport pair of a connection in > TIME_WAIT should not be reused under this scheme (i.e. the SYN can be > dropped), and I agree. Then I don't understand why a new connection > originating from a *different* source port (although from the same > source IP) is also considered a dupe and dropped. Are you referring to this code? if (tmp_opt.saw_tstamp && tcp_death_row.sysctl_tw_recycle && (dst = inet_csk_route_req(sk, req)) != NULL && (peer = rt_get_peer((struct rtable *)dst)) != NULL && peer->v4daddr == saddr) { if (xtime.tv_sec < peer->tcp_ts_stamp + TCP_PAWS_MSL && (s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW) { NET_INC_STATS_BH(LINUX_MIB_PAWSPASSIVEREJECTED); dst_release(dst); goto drop_and_free; } } Again, I cannot tell you what the intention of the implementors might have been, but my interpretation is that they wanted to implement time stamp checking as a (from the security standpoint positive) side effect of tw_recycle. I haven't thought about how (or if) the tw_recycle code could be improved, because I believe the benefits of TCP state reuse is overrated and the disadvantages overweight the advantages. Also, my work focuses on OSes which don't have this issue ;-) Thanks, Nils ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Dropped connections with tcp_tw_recycle=1
Sven, >>> tcp_tw_recycle is incompatible with NAT on the server side >> >> ... because it will enforce the verification of TCP time stamps. >> Unless all clients behind a NAT (actually PAD/masquerading) device >> use identical timestamps (within a certain range), most of them will >> send invalid TCP timestamps so SYNs will get dropped. > > I've been digging a bit more. [...] Thank you very much for your writeup regarding tcp_tw_recycle and timestamp verification. This is the part which I think I had already understood ... > tcp_tw_recycle and _reuse's actual reuse of tw buckets seems to happen > when setting up outbound connections. I haven't looked at those yet. ... but this is the part which I don't have a good understanding of yet. > The outer conditional verifies that the incoming SYN has a timestamp, > that tcp_tw_recycle is enabled, and that the origin exists in our > peer cache. Note that it only checks the IP of the origin. Doesn't it > make sense to also match on port? My understanding is that the fact that the connection is in TIME_WAIT implies that the source port should not be reused at this time. Nils ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Dropped connections with tcp_tw_recycle=1
Hi Michael and all, >>> tcp_tw_recycle is incompatible with NAT on the server side >> >> ... because it will enforce the verification of TCP time stamps. >> Unless all >> clients behind a NAT (actually PAD/masquerading) device use identical >> timestamps >> (within a certain range), most of them will send invalid TCP >> timestamps so SYNs >> will get dropped. > > Since you seem pretty knowledgeable on the subject, can you please > explain the difference between tcp_tw_reuse and tcp_tw_recycle? I think I have understood the reason why tcp_tw_recycle does not work with NAT connections, but I must say I haven't fully devoured the linux TCP implementation to explain to you the design decisions regarding these two options. The very basic idea is to re-use tcp connections in TIME_WAIT state, saving the overhead of destroying and recreating TCP state. I remember that at one point I had thought to have understood the difference, but I can't recall at the moment. In short: I can tell you that you *must not* use tcp_tw_recycle for any machine talking to machines behind masquerading firewalls (iow, only use it inside isolated networks). But I cannot tell you what exactly it is supposed to do and what the difference is to tcp_tw_reuse. If anyone finds out, please let me know as well! Nils ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc
Re: Dropped connections with tcp_tw_recycle=1
> tcp_tw_recycle is incompatible with NAT on the server side ... because it will enforce the verification of TCP time stamps. Unless all clients behind a NAT (actually PAD/masquerading) device use identical timestamps (within a certain range), most of them will send invalid TCP timestamps so SYNs will get dropped. This issue had also kept me busy for long hours and the basic insight is simple: Premature optimization is the root of all evil ;-), or, less philosophical, don't tune experimental parameters (the kernel docs are very clear about this!). Nils ___ varnish-misc mailing list varnish-misc@projects.linpro.no http://projects.linpro.no/mailman/listinfo/varnish-misc