Re: Best practice for caching scenario with different backend servers but same content

2021-10-06 Thread Dridi Boukelmoune
On Mon, Aug 16, 2021 at 1:34 PM Hamidreza Hosseini
 wrote:
>
> > In that case, hashing the URL only would prevent you from adding new
> domains through your Varnish server. It won't hurt if you know you
> will only ever have one domain to deal with, but hashing the host will
> also not hurt as long as you normalize it to a unique value.
>
> Hi,
> Let me elaborate my architecture more:
> I have some backend servers to serve hls fragments for video live stream,e.g:
>
> ```
>
> hls_backend_01
> hls_backend_02
> hls_backend_03
> hls_backend_04
> hls_backend_05
> hls_backend_06
> hls_backend_07
> hls_backend_08
> hls_backend_09
> hls_backend_10
>
> ```
>
> There is same content on all hls backend servers, there are 5 varnish in 
> front of them for caching
> Now If I use round-robin director on Varnishes, because varnish would cache " 
> req.http.host + req.url ", so for the same content but from different 
> backends it would cache double! for example:
> if varnish for the first request and "test.ts" file goes to "hls_backend_01"  
> backend server, would cache it and
> for the next request from other clients because it is using round-robin 
> director
> it goes to "hls_backend_02" and would cache the same file again due to 
> different "req.http.host"
> So now I have a solution to use Shard director based on "key=req.url" instead 
> of round robin
> another way is to use round robin but adjusting the hash vcl to something 
> like bellow:
>
> ```
>
> sub vcl_hash {
> hash_data(req.url);
> return (lookup);
> }
>
> ```
>
> In this way varnish just hash the "req.url" not "req.http.host"
> So, Varnish would cache the content based on the content uniqueness not based 
> on the difference between backends.
> 1. At first, I asked how I can normalize it, Is it possible at all according 
> to what I said!?
> Would you please explain it more with an example?

In this case I think you are confusing "req.http.host" (host header)
with the backend host name.

For example, if you reach one of your 5 Varnish servers via
www.example.com that's what clients will use and that's what
req.http.host will contain.

Your backends FQDNs could be something like this:

- hls01.internal.example.com
- hls02.internal.example.com
- hls03.internal.example.com
- ...
- hls10.internal.example.com

As the example suggests, these domains should not be directly reached
by clients if your goal is to proxy them with Varnish. Those internal
FQDNs should have no effect on the cache key populated with
hash_data(...).

> 2. You give an example about other domains, In this case I do not understand 
> what it has to do with the domain

Let's say your clients can reach either example.com or www.example.com
for the same service, or tomorrow you add more than your HLS service
behind Varnish you may very well receive multiple host headers.

> 3.Maybe I'm thinking in wrong way because if varnish hash the data based on 
> req.url : 'hash_data(req.url)' It shouldn't cache the same content but 
> different backends again!
> for example my request is :

In this case you are "hashing" the client request with hash_data(...)
and it has nothing to do with backend selection. The fallback director
will precisely not do any kind of traffic balancing since its purpose
is to always select the first healthy backend in the insertion order.
The shard director may rely on the request hash or other criteria as
we already covered.

> http://varnish-01:/hls/test.ts
> for first request it goes to "hls_backend_01" backend and cache it and for 
> next request it goes to "hls_backend_02" backend,
> so for each request it caches it again because backends are different?

All subsequent requests to http://varnish-01:/hls/test.ts should go to
the same hls_backend_01 backend with the shard director. As long as
there are no other criteria than the ones we already discussed. If you
want consistency across all your Varnish servers, you should configure
your shard directors identically, with the backends added in the same
order (unlike your initial VCL example using the fallback director).

Best,
Dridi
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Best practice for caching scenario with different backend servers but same content

2021-08-16 Thread Hamidreza Hosseini
> In that case, hashing the URL only would prevent you from adding new
domains through your Varnish server. It won't hurt if you know you
will only ever have one domain to deal with, but hashing the host will
also not hurt as long as you normalize it to a unique value.

Hi,
Let me elaborate my architecture more:
I have some backend servers to serve hls fragments for video live stream,e.g:

```

hls_backend_01
hls_backend_02
hls_backend_03
hls_backend_04
hls_backend_05
hls_backend_06
hls_backend_07
hls_backend_08
hls_backend_09
hls_backend_10

```

There is same content on all hls backend servers, there are 5 varnish in front 
of them for caching
Now If I use round-robin director on Varnishes, because varnish would cache " 
req.http.host + req.url ", so for the same content but from different backends 
it would cache double! for example:
if varnish for the first request and "test.ts" file goes to "hls_backend_01"  
backend server, would cache it and
for the next request from other clients because it is using round-robin director
it goes to "hls_backend_02" and would cache the same file again due to 
different "req.http.host"
So now I have a solution to use Shard director based on "key=req.url" instead 
of round robin
another way is to use round robin but adjusting the hash vcl to something like 
bellow:

```

sub vcl_hash {
hash_data(req.url);
return (lookup);
}

```

In this way varnish just hash the "req.url" not "req.http.host"
So, Varnish would cache the content based on the content uniqueness not based 
on the difference between backends.
1. At first, I asked how I can normalize it, Is it possible at all according to 
what I said!?
Would you please explain it more with an example?

2. You give an example about other domains, In this case I do not understand 
what it has to do with the domain?

3.Maybe I'm thinking in wrong way because if varnish hash the data based on 
req.url : 'hash_data(req.url)' It shouldn't cache the same content but 
different backends again!
for example my request is :

http://varnish-01:/hls/test.ts
for first request it goes to "hls_backend_01" backend and cache it and for next 
request it goes to "hls_backend_02" backend,
so for each request it caches it again because backends are different?

Many Thanks, Hamidreza


From: varnish-misc 
 on behalf of 
Dridi Boukelmoune 
Sent: Sunday, August 15, 2021 10:30 PM
To: varnish-misc@varnish-cache.org 
Subject: Re: Best practice for caching scenario with different backend servers 
but same content

On Sat, Aug 14, 2021 at 10:54 AM Hamidreza Hosseini
 wrote:
>
> Hi,
> Thanks to you and all varnish team for such answers that helped me alot,
> I read the default varnish cache configuration again:
> https://github.com/varnishcache/varnish-cache/blob/6.0/bin/varnishd/builtin.vcl
> and find out vcl_hash as follow:
>
> ```
> sub vcl_hash {
> hash_data(req.url);
> if (req.http.host) {
> hash_data(req.http.host);
> } else {
> hash_data(server.ip);
> }
> return (lookup);
> }
>
> ```
> So, if I change vcl_hash like following , would it be enough for my 
> purpose?(I mean caching the same object from different backends just once 
> with roundrobin directive !:)
>
> ```
>
> sub vcl_hash {
> hash_data(req.url);
> return (lookup);
> }
>
> ```
>
> By this config I told varnish just cache the content based on the 'req.url' 
> not 'req.http.host' therefore with the same content but different backend 
> varnish would cache once(If I want to use round robin directive instead of 
> shard directive ), Is this true? what bad consequences may it cause in the 
> future by this configuration?

In this case req.http.host usually refers to the the domain end users
resolve to find your varnish server (or other hops in front of it). It
is usually the same for every client, let's take 
www.myapp.com as an
example. If your varnish server is in front of multiple services, you
should be handling the different host headers explicitly. For exampe
if you have exactly two domains you should normalize them to some
canonical form. Using the same example domain that could be
www.myapp.com and static.myapp.com for instance.

In that case hashing the URL only would prevent you from adding new
domains through your Varnish server. It won't hurt if you know you
will only ever have one domain to deal with, but hashing the host will
also not hurt as long as you normalize it to a unique value.

You are correct that by default hashing the request appropriately will
help the shard director do the right thing out of the box. I remember
however that you only wanted to hash a subset of the URL for video
segments, so hashing the URL as-is won't provide the behavior you are
looking for.

Dridi
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc

Re: Best practice for caching scenario with different backend servers but same content

2021-08-15 Thread Dridi Boukelmoune
On Sat, Aug 14, 2021 at 10:54 AM Hamidreza Hosseini
 wrote:
>
> Hi,
> Thanks to you and all varnish team for such answers that helped me alot,
> I read the default varnish cache configuration again:
> https://github.com/varnishcache/varnish-cache/blob/6.0/bin/varnishd/builtin.vcl
> and find out vcl_hash as follow:
>
> ```
> sub vcl_hash {
> hash_data(req.url);
> if (req.http.host) {
> hash_data(req.http.host);
> } else {
> hash_data(server.ip);
> }
> return (lookup);
> }
>
> ```
> So, if I change vcl_hash like following , would it be enough for my 
> purpose?(I mean caching the same object from different backends just once 
> with roundrobin directive !:)
>
> ```
>
> sub vcl_hash {
> hash_data(req.url);
> return (lookup);
> }
>
> ```
>
> By this config I told varnish just cache the content based on the 'req.url' 
> not 'req.http.host' therefore with the same content but different backend 
> varnish would cache once(If I want to use round robin directive instead of 
> shard directive ), Is this true? what bad consequences may it cause in the 
> future by this configuration?

In this case req.http.host usually refers to the the domain end users
resolve to find your varnish server (or other hops in front of it). It
is usually the same for every client, let's take www.myapp.com as an
example. If your varnish server is in front of multiple services, you
should be handling the different host headers explicitly. For exampe
if you have exactly two domains you should normalize them to some
canonical form. Using the same example domain that could be
www.myapp.com and static.myapp.com for instance.

In that case hashing the URL only would prevent you from adding new
domains through your Varnish server. It won't hurt if you know you
will only ever have one domain to deal with, but hashing the host will
also not hurt as long as you normalize it to a unique value.

You are correct that by default hashing the request appropriately will
help the shard director do the right thing out of the box. I remember
however that you only wanted to hash a subset of the URL for video
segments, so hashing the URL as-is won't provide the behavior you are
looking for.

Dridi
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Best practice for caching scenario with different backend servers but same content

2021-08-14 Thread Hamidreza Hosseini
Hi,
Thanks to you and all varnish team for such answers that helped me alot,
I read the default varnish cache configuration again:
https://github.com/varnishcache/varnish-cache/blob/6.0/bin/varnishd/builtin.vcl
and find out vcl_hash as follow:

```
sub vcl_hash {
hash_data(req.url);
if (req.http.host) {
hash_data(req.http.host);
} else {
hash_data(server.ip);
}
return (lookup);
}

```
So, if I change vcl_hash like following , would it be enough for my purpose?(I 
mean caching the same object from different backends just once with roundrobin 
directive !:)

```

sub vcl_hash {
hash_data(req.url);
return (lookup);
}

```

By this config I told varnish just cache the content based on the 'req.url' not 
'req.http.host' therefore with the same content but different backend varnish 
would cache once(If I want to use round robin directive instead of shard 
directive ), Is this true? what bad consequences may it cause in the future by 
this configuration?


From: varnish-misc 
 on behalf of 
Hamidreza Hosseini 
Sent: Sunday, August 1, 2021 4:17 AM
To: varnish-misc@varnish-cache.org 
Subject: Best practice for caching scenario with different backend servers but 
same content

Hi,
I want to use varnish in my scenario as cache service, I have about 10 http 
servers that serve Hls fragments as the backend servers and about 5 varnish 
servers for caching purpose, the problem comes in when I use round-robin 
director for backend servers in varnish,
if a varnish for specific file requests to one backend server and for the same 
file but to another backend server it would cache that file again because of 
different Host headers ! so my solution is using fallback director instead of 
round-robin as follow:

```
In varnish-1:
new hls_cluster = directors.fallback();
hls_cluster.add_backend(b1());
hls_cluster.add_backend(b2());
hls_cluster.add_backend(b3());
hls_cluster.add_backend(b4());
hls_cluster.add_backend(b5());
hls_cluster.add_backend(b6());
hls_cluster.add_backend(b7());
hls_cluster.add_backend(b8());
hls_cluster.add_backend(b9());
hls_cluster.add_backend(b10());



In varnish-2:
new hls_cluster = directors.fallback();
hls_cluster.add_backend(b10());
hls_cluster.add_backend(b1());
hls_cluster.add_backend(b2());
hls_cluster.add_backend(b3());
hls_cluster.add_backend(b4());
hls_cluster.add_backend(b5());
hls_cluster.add_backend(b6());
hls_cluster.add_backend(b7());
hls_cluster.add_backend(b8());
hls_cluster.add_backend(b9());


In varnish-3:
new hls_cluster = directors.fallback();
hls_cluster.add_backend(b9());
hls_cluster.add_backend(b1());
hls_cluster.add_backend(b2());
hls_cluster.add_backend(b3());
hls_cluster.add_backend(b4());
hls_cluster.add_backend(b5());
hls_cluster.add_backend(b6());
hls_cluster.add_backend(b7());
hls_cluster.add_backend(b8());
hls_cluster.add_backend(b10());

```
But I think this is not the best solution, because there is no load balancing 
despite, I used different backend for the first argument of fallback directive,
What is varnish recommendation for this scenario?



___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Best practice for caching scenario with different backend servers but same content

2021-08-09 Thread Geoff Simmons
On 8/9/21 14:49, Geoff Simmons wrote:
> 
> A hash algorithm computes a number h(b) for a backend b, ...

Sorry, this should have been more like h(bereq), meaning that the number
is computed from features of the request. From that you get to the
choice of a backend.

The Varnish hash director uses the same hash value that was computed for
caching.

The shard director also does that by default, but some other choices can
be set with the parameter by in shard.backend(): by=HASH (default),
by=URL, by=KEY, by=BLOB


Geoff
-- 
** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

Tel +49 40 2880 5731
Mob +49 176 636 90917
Fax +49 40 42949753

http://uplex.de



OpenPGP_signature
Description: OpenPGP digital signature
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Best practice for caching scenario with different backend servers but same content

2021-08-09 Thread Geoff Simmons
Hello,

The best way to answer these questions is to start with the last one:

On 8/9/21 10:50, Hamidreza Hosseini wrote:
> 
> 6. conceptual question:
> 
> 1.What's the exact difference between hash and shard directive and when
> should we use which one?
> the Doc says when the backend changes shard is more consistent than hash
> but how?

It's not "more consistent", it's "consistent hashing", which is the name
of a hashing algorithm intended for load balancing:

https://en.wikipedia.org/wiki/Consistent_hashing

The hash director uses a more traditional kind of hashing algorithm to
map requests to backends. Consistent hashing, implemented by the shard
director, is intended to mitigate problems that can arise when backends
become unhealthy and then healthy again. The focus is mainly on backends
that do something expensive for new requests, such as their own caching,
but can work faster for requests that they've seen before.

A hash algorithm computes a number h(b) for a backend b, whose value is
in a large range, say 32 bit. Then if you have N backends, the
traditional algorithm indexes them from 0 to N-1, and picks h(b) mod N.
Varnish's hash director is something like that.

Say you have N=10 backends, so the traditional algorithm picks h(b) mod
10. Then a backend goes unhealthy, so now N=9. Since x mod 10 is unequal
to x mod 9 for all x, the mapping of requests to backends shifts
completely. This can be painful for backends that benefit from getting
mostly the same requests, for example due to caching.

After a while, the unhealthy backend becomes healthy again, so we go
from N=9 back to N=10. If in the meantime the backends had "gotten used
to" the changed distribution of requests, say by filling their local
caches, then they get the pain all over again.

Consistent hashing attempts to lessen the pain. If a backend drops out,
then the mapping of requests to that backend must change, but the
mapping stays the same for all other backends. So the distribution to
backends changes only as much as it has to.

Disclaimer: the original version of the shard director was developed at
my company. Some lines of code that I contributed are still in there.

> 2.What will it happen when I'm using shard director based on
> "key=bereq.url" if I add/delete one backend from backend lists? will it
> change the consistent hashing ring for requests?

That's what consistent hashing is about. It only changes for the
backends that were added or deleted, not for the rest.

> 1. First of all I think following section is not true, Do we need define
> both shard parameter(p) and replicas in the reconfigure directive?
> "hls_cluster.reconfigure(p, replicas=25);" or just
> "hls_cluster.reconfigure(replicas=25);"

There is no 2-argument form of reconfigure(). It has one optional
argument (replicas) with a default value.

> 2. What does "replicas=25" means in the sample configuration?
> 
> why is this neccessary?

The short answer is that if you have to ask, then use the default. Since
the argument is optional, and set to the default if you leave it out,
just leave it out: reconfigure()

You should NOT set it to 25.

replicas is an internal parameter of the algorithm, and there aren't
many guidelines as how it should be set, so we wanted to be able to
experiment with it. It's really there so that developers of Varnish and
the director can test it. Most users of Varnish don't need to worry
about it.

It turns out that the value doesn't matter all that much, as long as it
isn't too low. 25 is too low. I advise against setting it lower than the
default (67).

What replicas does and why it's necessary gets into details of the
algorithm. We can say more about that if there's interest, but this
email is getting pretty long as it is. (The Wikipedia article gets into
this in the section "Practical Extensions".)

> 3. In the shard.backend(...) section and About "resolve=LAZY":
> I couldn't understand what does LAZY resolve mean?

When you set bereq.backend to a director, most of the Varnish directors
do not immediately execute the algorithm to choose a backend, not until
it's time to actually send the backend request. This has some
advantages, for example if the VCL logic after setting bereq.backend
results in not going to that backend after all. resolve=LAZY works this way.

The alternative resolve=NOW is for contexts where you return the backend
(read the return value of shard.backend()) and need to know which
backend it's going to be. Then the backend is chosen right away, and
stays that way when the backend request is sent.

> 4. For returning healthy backend besides defining probes as I adjust,
> should I configure healthy=ALL as follow?

The parameters alt, healthy, rampup and warmup give you some control
over what happens when one or more backends drop out.

Say your request ordinarily maps to b1, but b1 is unhealthy; then there
is a specific backend b2 that is chosen next. If b2 is also unhealthy,
then there is a specific alternative 

Re: Best practice for caching scenario with different backend servers but same content

2021-08-09 Thread Hamidreza Hosseini
Hi,
This is my configuration based on Doc and sample configuration, I would 
appreciate if you answer my questions:

```
probe myprobe {
.request =
  "HEAD / HTTP/1.1"
  "Connection: close"
  "User-Agent: Varnish Health Probe";
.timeout = 1s;
.interval = 5s;
.window = 5;
.threshold = 3;
}


backend B1 { .host = "B1.mydomain.local"; .port = "80"; .probe = myprobe; }
backend B2 { .host = "B2.mydomain.local"; .port = "80"; .probe = myprobe; }
backend B3 { .host = "B3.mydomain.local"; .port = "80"; .probe = myprobe; }
backend B4 { .host = "B4.mydomain.local"; .port = "80"; .probe = myprobe; }
backend B5 { .host = "B5.mydomain.local"; .port = "80"; .probe = myprobe; }



sub vcl_init {

  new hls_cluster = directors.shard();
  hls_cluster.add_backend(B1);
  hls_cluster.add_backend(B2);
  hls_cluster.add_backend(B3);
  hls_cluster.add_backend(B4);
  hls_cluster.add_backend(B5);


  new p = directors.shard_param();
  hls_cluster.reconfigure(p, replicas=25);
  hls_cluster.associate(p.use());


}


sub vcl_backend_fetch {
  p.set(by=KEY, key=bereq.url);
  set bereq.backend_hint = hls_cluster.backend(resolve=LAZY, healthy=ALL);}

}


```


1. First of all I think following section is not true, Do we need define both 
shard parameter(p) and replicas in the reconfigure directive?
"hls_cluster.reconfigure(p, replicas=25);" or just 
"hls_cluster.reconfigure(replicas=25);"



2. What does "replicas=25" means in the sample configuration?
Doc says:
 (default ident being the backend name) for each backend and for a 
running number n from 1 to replicas

this is what I found out:
varnish will choose one number for a backend randomly from 1 to "replicas" 
number and then will combine it with name of backend and then will hash all for 
circular ring,
But when we have backends with different names this hash would be different for 
each backend because they have different name!
why is this neccessary?

`https://varnish-cache.org/docs/6.2/reference/vmod_directors.html#bool-xshard-reconfigure-int-replicas-67`


3. In the shard.backend(...) section and About "resolve=LAZY":
I couldn't understand what does LAZY resolve mean?
DOC:
```
LAZY: return an instance of this director for later backend resolution.
LAZY mode is required for referencing shard director instances, for example as 
backends for other directors (director layering).
```
https://varnish-cache.org/docs/6.1/reference/vmod_directors.generated.html#shard-backend


4. For returning healthy backend besides defining probes as I adjust, should I 
configure healthy=ALL as follow?

```
  set bereq.backend_hint = hls_cluster.backend(resolve=LAZY, healthy=ALL);}
```
DOC:

```
ALL: Check health state also for alternative backend selection
```

5. About rampup and warmup :
rampup: I understand that if a backend goes down and become healthy again if we 
defined a rampup period for it, it would wait till this period passed and then 
varnish will send the request to that backend for this fraction of time it will 
return alternative backend
warmup: for a choosen backend for specific key it will spread request between 
two backend (the original backend and its alternative if we define 0.5 for 
warmup)

Please correct me if I said anything wrong. I would apprecate if you could 
explain about the functionality of this two parameter.


6. conceptual question:

1.What's the exact difference between hash and shard directive and when should 
we use which one?
the Doc says when the backend changes shard is more consistent than hash but 
how?

2.What will it happen when I'm using shard director based on "key=bereq.url" if 
I add/delete one backend from backend lists? will it change the consistent 
hashing ring for requests?

Thanks for your answers in advance
Best regards, Hamidreza






From: Hamidreza Hosseini
Sent: Sunday, August 1, 2021 4:17 AM
To: varnish-misc@varnish-cache.org 
Subject: Best practice for caching scenario with different backend servers but 
same content

Hi,
I want to use varnish in my scenario as cache service, I have about 10 http 
servers that serve Hls fragments as the backend servers and about 5 varnish 
servers for caching purpose, the problem comes in when I use round-robin 
director for backend servers in varnish,
if a varnish for specific file requests to one backend server and for the same 
file but to another backend server it would cache that file again because of 
different Host headers ! so my solution is using fallback director instead of 
round-robin as follow:

```
In varnish-1:
new hls_cluster = directors.fallback();
hls_cluster.add_backend(b1());
hls_cluster.add_backend(b2());
hls_cluster.add_backend(b3());
hls_cluster.add_backend(b4());
hls_cluster.add_backend(b5());
hls_cluster.add_backend(b6());
hls_cluster.add_backend(b7());
hls_cluster.add_backend(b8());
hls_cluster.add_backend(b9());
hls_cluster.add_backend(b10());



In varnish-2:
new 

Re: Best practice for caching scenario with different backend servers but same content

2021-08-07 Thread Hamidreza Hosseini
Hi,
I read the sample config that you sent but Can I use "bereq.url"  in this way:
for example I want to shard my requests for live streams based on the url's 
that clients enter, for example if the following url's are different live 
streams (stream1 and stream2 are the name of different streams ):
```
mydomain.com/live/australia/stream1/chunk_43212123.ts
mydomain.com/live/australia/stream2/chunk_43212123.ts
mydomain.com/live/australia/stream3/chunk_43212123.ts
```
Now think I want just the url excluded with chunk file becomes hashed and 
sharded:
Just this part: "/live/australia/stream{1,2,3}/"
Not :"/live/australia/stream{1,2,3}/chunk_43212123.ts"

So by adjusting " p.set(by=KEY, key=bereq.url) " it would shard "bereq.url",  
it means "/live/australia/stream{1,2,3}/chunk_43212123.ts"



From: varnish-misc 
 on behalf of 
Hamidreza Hosseini 
Sent: Sunday, August 1, 2021 4:17 AM
To: varnish-misc@varnish-cache.org 
Subject: Best practice for caching scenario with different backend servers but 
same content

Hi,
I want to use varnish in my scenario as cache service, I have about 10 http 
servers that serve Hls fragments as the backend servers and about 5 varnish 
servers for caching purpose, the problem comes in when I use round-robin 
director for backend servers in varnish,
if a varnish for specific file requests to one backend server and for the same 
file but to another backend server it would cache that file again because of 
different Host headers ! so my solution is using fallback director instead of 
round-robin as follow:

```
In varnish-1:
new hls_cluster = directors.fallback();
hls_cluster.add_backend(b1());
hls_cluster.add_backend(b2());
hls_cluster.add_backend(b3());
hls_cluster.add_backend(b4());
hls_cluster.add_backend(b5());
hls_cluster.add_backend(b6());
hls_cluster.add_backend(b7());
hls_cluster.add_backend(b8());
hls_cluster.add_backend(b9());
hls_cluster.add_backend(b10());



In varnish-2:
new hls_cluster = directors.fallback();
hls_cluster.add_backend(b10());
hls_cluster.add_backend(b1());
hls_cluster.add_backend(b2());
hls_cluster.add_backend(b3());
hls_cluster.add_backend(b4());
hls_cluster.add_backend(b5());
hls_cluster.add_backend(b6());
hls_cluster.add_backend(b7());
hls_cluster.add_backend(b8());
hls_cluster.add_backend(b9());


In varnish-3:
new hls_cluster = directors.fallback();
hls_cluster.add_backend(b9());
hls_cluster.add_backend(b1());
hls_cluster.add_backend(b2());
hls_cluster.add_backend(b3());
hls_cluster.add_backend(b4());
hls_cluster.add_backend(b5());
hls_cluster.add_backend(b6());
hls_cluster.add_backend(b7());
hls_cluster.add_backend(b8());
hls_cluster.add_backend(b10());

```
But I think this is not the best solution, because there is no load balancing 
despite, I used different backend for the first argument of fallback directive,
What is varnish recommendation for this scenario?



___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Best practice for caching scenario with different backend servers but same content

2021-08-05 Thread Guillaume Quintard
Hi,

I'm pretty sure there's a confusion with the sequence of actions here.
Normalization happen *before* you look into the cache, so that way before
you fetch anything from the backend. By the time you cache the data
(vcl_backend_response), the hash key has already been set (vcl_hash), it's
way too late to normalize the request.

As to normalization, it's usually done in vcl_recv, and it can range from
just setting the host header to a static string to using std.tolower() and
removing the host port.

for the sake of the example:

sub vcl_vcl {
   set req.http.host = "myvideoservice.com";
}

For shard example, look at the VCTs, for example:
https://github.com/varnishcache/varnish-cache/blob/6.6/bin/varnishtest/tests/d00029.vtc#L66

import directors;

sub vcl_init {
new shard_dir = directors.shard();
shard_dir.add_backend(be1);
shard_dir.add_backend(be2);
shard_dir.add_backend(be3;

new p = directors.shard_param();
vd.associate(p.use());

vd.reconfigure(replicas=25);}
sub vcl_backend_fetch {
p.set(by=KEY, key=bereq.url);
set bereq.backend_hint = shard_dir.backend(resolve=LAZY);}

For udo:

import crypto;
import udo;

sub vcl_init {
new udo_dir = udo.director();
udo_dir.set_type(random);
udo_dir.add_backend(be1);
udo_dir.add_backend(be2);
udo_dir.add_backend(be3);
udo_dir.set_type(hash);}
sub vcl_backend_fetch {
set bereq.backend_hint = udo_dir.backend();
udo_dir.set_hash(crypto.hash(sha256, bereq.url));}


These have been written without testing, so don't put them straight into
production.

-- 
Guillaume Quintard


On Thu, Aug 5, 2021 at 3:33 AM Hamidreza Hosseini 
wrote:

> Hi,
> 1.
>
> Is there any way to normalize host headers and other things to say to
> varnish not to cache the same content for different backend?
> I want to use round robin director but after fetching the content I want
> to normalize the header and cache the content,
> I would appreciate if you give me an example about this and how I can do
> it.
>
> 2.
> I couldn't find any good example for directors-shard and
> xshard-key-string, I would appreciate if you could give example about this
> too.
>
> Many Thanks
> --
> *From:* varnish-misc  hotmail@varnish-cache.org> on behalf of Hamidreza Hosseini <
> hrhosse...@hotmail.com>
> *Sent:* Sunday, August 1, 2021 4:17 AM
> *To:* varnish-misc@varnish-cache.org 
> *Subject:* Best practice for caching scenario with different backend
> servers but same content
>
> Hi,
> I want to use varnish in my scenario as cache service, I have about 10
> http servers that serve Hls fragments as the backend servers and about 5
> varnish servers for caching purpose, the problem comes in when I use
> round-robin director for backend servers in varnish,
> if a varnish for specific file requests to one backend server and for the
> same file but to another backend server it would cache that file again
> because of different Host headers ! so my solution is using fallback
> director instead of round-robin as follow:
>
> ```
> In varnish-1:
> new hls_cluster = directors.fallback();
> hls_cluster.add_backend(b1());
> hls_cluster.add_backend(b2());
> hls_cluster.add_backend(b3());
> hls_cluster.add_backend(b4());
> hls_cluster.add_backend(b5());
> hls_cluster.add_backend(b6());
> hls_cluster.add_backend(b7());
> hls_cluster.add_backend(b8());
> hls_cluster.add_backend(b9());
> hls_cluster.add_backend(b10());
>
>
>
> In varnish-2:
> new hls_cluster = directors.fallback();
> hls_cluster.add_backend(b10());
> hls_cluster.add_backend(b1());
> hls_cluster.add_backend(b2());
> hls_cluster.add_backend(b3());
> hls_cluster.add_backend(b4());
> hls_cluster.add_backend(b5());
> hls_cluster.add_backend(b6());
> hls_cluster.add_backend(b7());
> hls_cluster.add_backend(b8());
> hls_cluster.add_backend(b9());
>
>
> In varnish-3:
> new hls_cluster = directors.fallback();
> hls_cluster.add_backend(b9());
> hls_cluster.add_backend(b1());
> hls_cluster.add_backend(b2());
> hls_cluster.add_backend(b3());
> hls_cluster.add_backend(b4());
> hls_cluster.add_backend(b5());
> hls_cluster.add_backend(b6());
> hls_cluster.add_backend(b7());
> hls_cluster.add_backend(b8());
> hls_cluster.add_backend(b10());
>
> ```
> But I think this is not the best solution, because there is no load
> balancing despite, I used different backend for the first argument of
> fallback directive,
> What is varnish recommendation for this scenario?
>
>
>
> ___
> varnish-misc mailing list
> varnish-misc@varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
___
varnish-misc mailing list
varnish-misc@varnish-cache.org

Re: Best practice for caching scenario with different backend servers but same content

2021-08-05 Thread Hamidreza Hosseini
Hi,
1.

Is there any way to normalize host headers and other things to say to varnish 
not to cache the same content for different backend?
I want to use round robin director but after fetching the content I want to 
normalize the header and cache the content,
I would appreciate if you give me an example about this and how I can do it.

2.
I couldn't find any good example for directors-shard and xshard-key-string, I 
would appreciate if you could give example about this too.

Many Thanks

From: varnish-misc 
 on behalf of 
Hamidreza Hosseini 
Sent: Sunday, August 1, 2021 4:17 AM
To: varnish-misc@varnish-cache.org 
Subject: Best practice for caching scenario with different backend servers but 
same content

Hi,
I want to use varnish in my scenario as cache service, I have about 10 http 
servers that serve Hls fragments as the backend servers and about 5 varnish 
servers for caching purpose, the problem comes in when I use round-robin 
director for backend servers in varnish,
if a varnish for specific file requests to one backend server and for the same 
file but to another backend server it would cache that file again because of 
different Host headers ! so my solution is using fallback director instead of 
round-robin as follow:

```
In varnish-1:
new hls_cluster = directors.fallback();
hls_cluster.add_backend(b1());
hls_cluster.add_backend(b2());
hls_cluster.add_backend(b3());
hls_cluster.add_backend(b4());
hls_cluster.add_backend(b5());
hls_cluster.add_backend(b6());
hls_cluster.add_backend(b7());
hls_cluster.add_backend(b8());
hls_cluster.add_backend(b9());
hls_cluster.add_backend(b10());



In varnish-2:
new hls_cluster = directors.fallback();
hls_cluster.add_backend(b10());
hls_cluster.add_backend(b1());
hls_cluster.add_backend(b2());
hls_cluster.add_backend(b3());
hls_cluster.add_backend(b4());
hls_cluster.add_backend(b5());
hls_cluster.add_backend(b6());
hls_cluster.add_backend(b7());
hls_cluster.add_backend(b8());
hls_cluster.add_backend(b9());


In varnish-3:
new hls_cluster = directors.fallback();
hls_cluster.add_backend(b9());
hls_cluster.add_backend(b1());
hls_cluster.add_backend(b2());
hls_cluster.add_backend(b3());
hls_cluster.add_backend(b4());
hls_cluster.add_backend(b5());
hls_cluster.add_backend(b6());
hls_cluster.add_backend(b7());
hls_cluster.add_backend(b8());
hls_cluster.add_backend(b10());

```
But I think this is not the best solution, because there is no load balancing 
despite, I used different backend for the first argument of fallback directive,
What is varnish recommendation for this scenario?



___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Best practice for caching scenario with different backend servers but same content

2021-08-01 Thread Guillaume Quintard
Hi,

There are a lot of things to unpack here.

> if a varnish for specific file requests to one backend server and for the
same file but to another backend server it would cache that file again
because of different Host headers ! so my solution is using fallback
director instead of round-robin

The two aren't related, if you have a hashing problem causing you to cache
the same object twice, changing the directors isn't going to save you.
Ideally, the requests will get normalized (host header and path) in
vcl_recv{} so that they will be properly hashed in vcl_hash{}.

The backend resolution only happens after you have exited
vcl_backend_fetch{}, long after you have (not) found the object in the
cache, and the best solution for video is usually to use
consistent-hashing. In open-source this means vmod_shard (
https://varnish-cache.org/docs/trunk/reference/vmod_directors.html#directors-shard),
in Enterprise, it'll be udo (
https://docs.varnish-software.com/varnish-cache-plus/vmods/udo/#set-hash),
they are going to handle about the same except udo makes it easier to set
the hash, which may be important for live (more info below).

With consistent hashing, you can configure all Varnish servers the same,
and they will determine which backend to use based on the request.
Typically, the same request will always go to the same backend. This
provides pretty good load-balancing over time, and additionally it
leverages the internally caching that most video origins have.

If you are serving VOD, that is all you need, but if you are serving Live,
you need to care about one other thing: you want consistent hashing not per
request, but per stream. Because the origins may be slightly out of sync,
you may get a manifest on origin A which will advertise a chunk that isn't
available anywhere yet, and if you don't fetch the new chunk from origin A,
you'll get a 404 or a 412.
So, for live, you will need to use shard's key() (
https://varnish-cache.org/docs/trunk/reference/vmod_directors.html#int-xshard-key-string)
or udo's set_hash() (
https://docs.varnish-software.com/varnish-cache-plus/vmods/udo/#set-hash)
to create a hash based on the stream path.

For example, consider these paths:
- /live/australia/Channel5/480p/manifest.m3u8 and
/live/australia/Channel5/480p/chunk_43212123.ts: the stream path is
/live/australia/Channel5/480p/
- /video/live/52342645323/manifest.dash and
/video/live/52342645323/manifest.dash?time=4216432432=8000=523453:
the stream path is /video/live/52342645323/manifest.dash

On top of all this, if you start having more than 5 Varnish servers, you
might want to consider adding an extra layer of caching between the
client-facing Varnish nodes and the origins (origin shields) to reduce
the load on the origins. In that case, the shields would be the one
handling the consistent hashing.

Hope this helps

-- 
Guillaume Quintard


On Sun, Aug 1, 2021 at 4:18 AM Hamidreza Hosseini 
wrote:

> Hi,
> I want to use varnish in my scenario as cache service, I have about 10
> http servers that serve Hls fragments as the backend servers and about 5
> varnish servers for caching purpose, the problem comes in when I use
> round-robin director for backend servers in varnish,
> if a varnish for specific file requests to one backend server and for the
> same file but to another backend server it would cache that file again
> because of different Host headers ! so my solution is using fallback
> director instead of round-robin as follow:
>
> ```
> In varnish-1:
> new hls_cluster = directors.fallback();
> hls_cluster.add_backend(b1());
> hls_cluster.add_backend(b2());
> hls_cluster.add_backend(b3());
> hls_cluster.add_backend(b4());
> hls_cluster.add_backend(b5());
> hls_cluster.add_backend(b6());
> hls_cluster.add_backend(b7());
> hls_cluster.add_backend(b8());
> hls_cluster.add_backend(b9());
> hls_cluster.add_backend(b10());
>
>
>
> In varnish-2:
> new hls_cluster = directors.fallback();
> hls_cluster.add_backend(b10());
> hls_cluster.add_backend(b1());
> hls_cluster.add_backend(b2());
> hls_cluster.add_backend(b3());
> hls_cluster.add_backend(b4());
> hls_cluster.add_backend(b5());
> hls_cluster.add_backend(b6());
> hls_cluster.add_backend(b7());
> hls_cluster.add_backend(b8());
> hls_cluster.add_backend(b9());
>
>
> In varnish-3:
> new hls_cluster = directors.fallback();
> hls_cluster.add_backend(b9());
> hls_cluster.add_backend(b1());
> hls_cluster.add_backend(b2());
> hls_cluster.add_backend(b3());
> hls_cluster.add_backend(b4());
> hls_cluster.add_backend(b5());
> hls_cluster.add_backend(b6());
> hls_cluster.add_backend(b7());
> hls_cluster.add_backend(b8());
> hls_cluster.add_backend(b10());
>
> ```
> But I think this is not the best solution, because there is no load
> balancing despite, I used different backend for the first argument of
> fallback directive,
> What is varnish 

Best practice for caching scenario with different backend servers but same content

2021-08-01 Thread Hamidreza Hosseini
Hi,
I want to use varnish in my scenario as cache service, I have about 10 http 
servers that serve Hls fragments as the backend servers and about 5 varnish 
servers for caching purpose, the problem comes in when I use round-robin 
director for backend servers in varnish,
if a varnish for specific file requests to one backend server and for the same 
file but to another backend server it would cache that file again because of 
different Host headers ! so my solution is using fallback director instead of 
round-robin as follow:

```
In varnish-1:
new hls_cluster = directors.fallback();
hls_cluster.add_backend(b1());
hls_cluster.add_backend(b2());
hls_cluster.add_backend(b3());
hls_cluster.add_backend(b4());
hls_cluster.add_backend(b5());
hls_cluster.add_backend(b6());
hls_cluster.add_backend(b7());
hls_cluster.add_backend(b8());
hls_cluster.add_backend(b9());
hls_cluster.add_backend(b10());



In varnish-2:
new hls_cluster = directors.fallback();
hls_cluster.add_backend(b10());
hls_cluster.add_backend(b1());
hls_cluster.add_backend(b2());
hls_cluster.add_backend(b3());
hls_cluster.add_backend(b4());
hls_cluster.add_backend(b5());
hls_cluster.add_backend(b6());
hls_cluster.add_backend(b7());
hls_cluster.add_backend(b8());
hls_cluster.add_backend(b9());


In varnish-3:
new hls_cluster = directors.fallback();
hls_cluster.add_backend(b9());
hls_cluster.add_backend(b1());
hls_cluster.add_backend(b2());
hls_cluster.add_backend(b3());
hls_cluster.add_backend(b4());
hls_cluster.add_backend(b5());
hls_cluster.add_backend(b6());
hls_cluster.add_backend(b7());
hls_cluster.add_backend(b8());
hls_cluster.add_backend(b10());

```
But I think this is not the best solution, because there is no load balancing 
despite, I used different backend for the first argument of fallback directive,
What is varnish recommendation for this scenario?



___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc