Re: Child panics on OpenSolaris

2010-02-17 Thread Nils Goroll
Hi Poul-Henning,

> Yeah, we clearly need to improve the autocrap magic a bit to get stuff
> like this right.

Yes. I'm sorry for not having documented this a year ago. Would have saved a 
lot 
of pain and effort...

> I have added a runtime test now, which panics the child process if the
> errno variable is not working properly.

Great!

>> VCC_CC="exec gcc -fpic -D_REENTRANT -m64 -shared -o %o %s"
> 
> Right now our VCC_CC default is for the sun-compiler I think ("-Kpic"),

Yes. The reason I prefer gcc is simply that SunCC is not available everywhere 
and if I understand the licence correctly, it is still only free to use for 
developers and development purposes.

> That's another piece of autocrap stuff that needs fixed...

Do you want me to do it?

Nils
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Child panics on OpenSolaris

2010-02-17 Thread Nils Goroll
Hi Poul-Henning and all,

> We need CFLAGS to contain -mt on solaris, otherwise errno does not
> get macro-fied to be per-thread.
> 
> That's where the: EBADF comes from, some entirely different
> filedescriptor a long time ago, in the master process...

Fantastic. Thank you for putting so much effort into this.

But also: Stupid I didn't think about this. See how I'm compiling varnish since 
I started working with it:

## 64bit
LDFLAGS="-lpthread"
CFLAGS="-D_REENTRANT -m64" \
VCC_CC="exec gcc -fpic -D_REENTRANT -m64 -shared -o %o %s" \
./configure \
'--enable-debugging-symbols' \
'--enable-developer-warnings' \
'--enable-dependency-tracking' \


IIUC, this effectively has the same effect as -mt for SunCC:

User Commands   cc(1)



NAME
  cc - C compiler

[...]

  -mt  Use this option to compile and link multithreaded code.
   The -mt option assures that libraries are linked in the
   appropriate order.

   This option passes -D_REENTRANT to the preprocessor and
   passes -lthread in the correct order to ld.

   If you are using POSIX threads, you must link with the
   options -mt -lpthread.

Nils
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Child panics on OpenSolaris

2010-02-15 Thread Nils Goroll
Hi Poul-Henning,

> That was one of my theories, but it does not fit the facs of the case,

I don't understand what contradicts this hypothesis?

> and it would violate POSIX, which I doubt Solaris would do...

Just out of interest: Do you have a reference why that would violate POSIX?

http://www.opengroup.org/onlinepubs/009695399/functions/poll.html doesn't tell.

> The best contender is still that varnish closes the fd by mistake,
> but I'll be damned if I can find where...

Please let's come back to the basics and your initial hypothesis one last time:
Suppose the RST gets received between the poll() and ioctl() in TCP_nonblocking.

Why should the ioctl not be allowed to return EBADF in that case? For instance,
this man page

Ioctl Requests   streamio(7I)



NAME
  streamio - STREAMS ioctl commands

SYNOPSIS
  #include 
  #include 
  #include 

  int ioctl(int fildes, int command, ... /*arg*/);

only mentions a hand full of error codes and then there is this:

http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/inet/proto_set.c#tli_errs

I'm not sure this is the exact mapping for this case, but it looks like it...

Nils

___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Child panics on OpenSolaris

2010-02-15 Thread Nils Goroll
Poul-Henning,

> Child (3504) died signal=6
> Child (3504) Panic message: Assert error in TCP_nonblocking(), tcp.c line 172:
>   Condition((ioctl(sock, ((int)((uint32_t)(0x8000|(((sizeof
> (int))&0xff)<<16)| ('f'<<8)|126))), &i)) == 0) not true.
> errno = 9 (Bad file number)
> thread = (cache-worker)
> ident = -smalloc,-hcritbit,poll
> Backtrace:
>   4457db: /opt/sbin/varnishd'pan_backtrace+0x1b [0x4457db]
>   445ae5: /opt/sbin/varnishd'pan_ic+0x1c5 [0x445ae5]
>   fd7ff2efdfec: /opt/lib/libvarnish.so.1.0.0'TCP_nonblocking+0x7c
> [0xfd7ff2efdfec]
>   419091: /opt/sbin/varnishd'vca_return_session+0x1b1 [0x419091]
>   426aad: /opt/sbin/varnishd'cnt_wait+0x2bd [0x426aad]
>   42bc3a: /opt/sbin/varnishd'CNT_Session+0x4ba [0x42bc3a]
>   44835b: /opt/sbin/varnishd'wrk_do_cnt_sess+0x19b [0x44835b]
>   447954: /opt/sbin/varnishd'wrk_thread_real+0x854 [0x447954]
>   447eb3: /opt/sbin/varnishd'wrk_thread+0x123 [0x447eb3]

Just an idea from checking differences between the code I use and trunk: In 
cnt_wait, shouldn't we check pfd[0].revents for POLLERR and POLLHUP? Could it 
be 
that Solaris assumes that delivery an error once should suffice, so further use 
of the fd will return EBADF?

Again, I haven't investigated further, sorry for the noise if this turns out to 
be stupid.

Nils
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Child panics on OpenSolaris

2010-02-15 Thread Nils Goroll
Hi Poul-Henning,

Victor added a comment to http://varnish-cache.org/ticket/629:

http://pastie.org/791964

child (28980) Started
Child (28980) said Closed fds: 3 5 6 9 10 12 13
Child (28980) said Child starts
Child (28980) said managed to mmap 536870912 bytes of 536870912
Child (28980) died signal=6
Child (28980) Panic message: Assert error in vca_main(), cache_waiter_ports.c 
line 175:
   Condition((errno == EINTR) || (errno == ETIME)) not true.
errno = 9 (Bad file number)
Backtrace:
   42e405: /opt/extra/sbin/varnishd'pan_ic+0x95 [0x42e405]
   fd7ffefa4ee0: /lib/amd64/libc.so.1'_lwp_start+0x0 [0xfd7ffefa4ee0]

I definitely do not see this in my patched up 2.0.3 (varnish running for weeks 
on 7 servers now), so IMHO, this is another hint into the direction of some 
change in trunk.

Nils
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Child panics on OpenSolaris

2010-02-15 Thread Nils Goroll
Hi Paul,

> You're right, I'd forgotten to mention it before now.
> 
> [r...@varnish bin]# uname -a
> SunOS varnish 5.11 snv_111b i86pc i386 i86pc

Good, thank you. Then at least this shouldn't be a bug introduced with the 
major 
changes of the ip datapath refactoring project 
http://hub.opensolaris.org/bin/view/Project+ip-refactor/WebHome

Nils
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Child panics on OpenSolaris

2010-02-14 Thread Nils Goroll
Hi Poul-Henning and all,

> If I had implemented the hack I suspect Solaris contains, I would
> have found some bit somewhere, to make sure the errno would be the
> correct, documented and expected:
> 
>   #define ECONNRESET  54 /* Connection reset by peer */
> 
> Somebody with a Solaris service contract, if such things still
> exist, should report this as a bug to them...
> 
> I will add a workaround to Varnish, with a suitable sarcastic
> commentary...

I can't tell at this point if this is a Solaris or a Varnish issue, but I can 
tell for sure that I have never seen it on 2.0.3 with a couple of additional 
fixes running a *high* traffic site. The main difference between this version 
and trunk with respect to TCP is, IIUC, the lack of tcp_linger. Maybe that 
gives 
a clue on where to look, but on the other hand this might be absolutely the 
wrong war to go.

Regarding Solaris, this piece of code here

http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/inet/tcp/tcp.c#11049

documents the errnos that SHOULD be set upon reception of an RST depending on 
various conditions.

Regarding sarcasm: Please add the comment when the current hypothesis proves 
true, but my experience is that in many cases the Solaris guys are quite fuzzy 
about documentation.

I volunteer to get this fixed in OpenSolaris should it turn out that the issue 
is root caused there.

Nils
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Child panics on OpenSolaris

2010-02-14 Thread Nils Goroll
Paul,

have I missed anything or have you not yet stated which version of OpenSolaris 
(uname -v) you're using? If you don't use a standard build, could you please 
give the hg changeset you're on?

Nils
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Time to replace the hit ratio with something more intuitive?

2010-01-19 Thread Nils Goroll
Hi,

in http://varnish.projects.linpro.no/ticket/613 I have suggested to add a 
measure to varnishstat which I thought could be called the "efficiency ratio".

Tollef has commented that we'd need the community's (YOUR) opinion on this:

The varnishstat cache hit rate basically gives a ratio for how many requests 
being directed to the cache component of varnish have been answered from it. It 
does not say anything about the number of requests being passed onto the 
backend 
for whatever reason. So it is possible to see cache hit rates of 0. 
(99.99%) 
but still 99% of the client requests hit your backend, if only 1% of the 
requests qualify for being served from the cache.

I am suggesting to amend (or replace ?) this figure by a ratio of client 
requests being handled by the cache by total number of requests. In other 
words, 
a measure for how many of the client requests do not result in a backend 
request.

My experience is that this figure is far more important, because cache users 
will mostly be interested in saving backend requests. The cache hit rate is 
probably of secondary importance, and it can be confusing to get a high cache 
hit rate while still (too) many requests are hitting the backend.

Here's how the two figures look like on a production system:

Hitrate ratio:  10  100 1000
Hitrate avg:0.9721   0.9721   0.9731
Efficiency ratio:  10  100 1000
Efficiency avg:0.9505   0.9522   0.9533

 55697963   200.97   256.93 Client connections accepted
402992210  1518.81  1858.98 Client requests received
390022582  1471.82  1799.15 Cache hits
 1549 0.00 0.01 Cache hits for pass
  905363722.0041.76 Cache misses


Now it's up to you, what do you think about this?

Nils
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Combining req.url matches

2010-01-11 Thread Nils Goroll
Hi "Pub Crawler",

> For instance:
> if (req.url ~ "photoupload.cfm") {pass;}
> if (req.url ~ "logoupload.cfm") {pass;}
> 
> Is there a prescribed way to combine that into one line?

Firstly, you should note that the argument to the ~ operator is a regexp, so if 
you mean a literal dot, it's \. . You can also group subexpressions like this:

if (req.url ~ "(photo|logo)upload\.cfm") {pass;}

Nils
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Dropped connections with tcp_tw_recycle=1

2009-09-22 Thread Nils Goroll
Sven,

> Right, you're saying that the srcaddr+srcport pair of a connection in
> TIME_WAIT should not be reused under this scheme (i.e. the SYN can be
> dropped), and I agree. Then I don't understand why a new connection
> originating from a *different* source port (although from the same
> source IP) is also considered a dupe and dropped.

Are you referring to this code?

 if (tmp_opt.saw_tstamp &&
 tcp_death_row.sysctl_tw_recycle &&
 (dst = inet_csk_route_req(sk, req)) != NULL &&
 (peer = rt_get_peer((struct rtable *)dst)) != NULL &&
 peer->v4daddr == saddr) {
 if (xtime.tv_sec < peer->tcp_ts_stamp + TCP_PAWS_MSL &&
 (s32)(peer->tcp_ts - req->ts_recent) >
 TCP_PAWS_WINDOW) {
 
NET_INC_STATS_BH(LINUX_MIB_PAWSPASSIVEREJECTED);
 dst_release(dst);
 goto drop_and_free;
 }
 }

Again, I cannot tell you what the intention of the implementors might have 
been, 
but my interpretation is that they wanted to implement time stamp checking as a 
(from the security standpoint positive) side effect of tw_recycle.

I haven't thought about how (or if) the tw_recycle code could be improved, 
because I believe the benefits of TCP state reuse is overrated and the 
disadvantages overweight the advantages. Also, my work focuses on OSes which 
don't have this issue ;-)

Thanks, Nils
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Dropped connections with tcp_tw_recycle=1

2009-09-22 Thread Nils Goroll
Sven,

>>> tcp_tw_recycle is incompatible with NAT on the server side
>>
>> ... because it will enforce the verification of TCP time stamps.
>> Unless all clients behind a NAT (actually PAD/masquerading) device
>> use identical timestamps (within a certain range), most of them will
>> send invalid TCP timestamps so SYNs will get dropped.
> 
> I've been digging a bit more. [...]

Thank you very much for your writeup regarding tcp_tw_recycle and timestamp 
verification. This is the part which I think I had already understood ...

 > tcp_tw_recycle and _reuse's actual reuse of tw buckets seems to happen
 > when setting up outbound connections. I haven't looked at those yet.

... but this is the part which I don't have a good understanding of yet.

> The outer conditional verifies that the incoming SYN has a timestamp,
> that tcp_tw_recycle is enabled, and that the origin exists in our
> peer cache. Note that it only checks the IP of the origin. Doesn't it
> make sense to also match on port?

My understanding is that the fact that the connection is in TIME_WAIT implies 
that the source port should not be reused at this time.

Nils
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Dropped connections with tcp_tw_recycle=1

2009-09-21 Thread Nils Goroll
Hi Michael and all,

>>> tcp_tw_recycle is incompatible with NAT on the server side
>>
>> ... because it will enforce the verification of TCP time stamps. 
>> Unless all
>> clients behind a NAT (actually PAD/masquerading) device use identical 
>> timestamps
>> (within a certain range), most of them will send invalid TCP 
>> timestamps so SYNs
>> will get dropped.
> 
> Since you seem pretty knowledgeable on the subject, can you please 
> explain the difference between tcp_tw_reuse and tcp_tw_recycle?

I think I have understood the reason why tcp_tw_recycle does not work with NAT 
connections, but I must say I haven't fully devoured the linux TCP 
implementation to explain to you the design decisions regarding these two 
options.

The very basic idea is to re-use tcp connections in TIME_WAIT state, saving the 
overhead of destroying and recreating TCP state. I remember that at one point I 
had thought to have understood the difference, but I can't recall at the moment.

In short: I can tell you that you *must not* use tcp_tw_recycle for any machine 
talking to machines behind masquerading firewalls (iow, only use it inside 
isolated networks). But I cannot tell you what exactly it is supposed to do and 
what the difference is to tcp_tw_reuse. If anyone finds out, please let me know 
as well!

Nils
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Dropped connections with tcp_tw_recycle=1

2009-09-20 Thread Nils Goroll
> tcp_tw_recycle is incompatible with NAT on the server side

... because it will enforce the verification of TCP time stamps. Unless all 
clients behind a NAT (actually PAD/masquerading) device use identical 
timestamps 
(within a certain range), most of them will send invalid TCP timestamps so SYNs 
will get dropped.

This issue had also kept me busy for long hours and the basic insight is 
simple: 
Premature optimization is the root of all evil ;-), or, less philosophical, 
don't tune experimental parameters (the kernel docs are very clear about this!).

Nils
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc