Re: Varnish suddenly started using much more memory

2024-06-13 Thread Guillaume Quintard
Sorry Batanun, this thread got lost in my inbox. Would you be able to
upgrade to 7.5 and see if you get the same results? I'm pretty sure it's a
jemalloc issue, but upgrading should make it clear.
You are on Ubuntu, right? Which version?
-- 
Guillaume Quintard


On Mon, May 20, 2024 at 1:50 AM Batanun B  wrote:

> > Sorry, I should have been clearer, I meant: where are the varnish
> packages coming from? Are they from the official repositories, from
> https://packagecloud.io/varnishcache/ or built from source maybe?
>
> Ah, I see. They come from varnishcache packagecloud. More specifically, we
> use:
>
>
> https://packagecloud.io/install/repositories/varnishcache/varnish60lts/script.deb.sh
>
>
> > you should really invest some time in something like prometheus, it
> would probably have made the issue obvious
>
> Yes, in hindsight we definitely should have done that. I will discuss this
> with my coworkers going forward.
>
>
> > Is there any chance you can run the old version on the server to explore
> the differences?
>
> Possibly, for a limited time. If so, what types of tests would I do? And
> how long time would I need to run the old version?
>
> Note that with our setup, we wouldn't be able to run two different images
> at the same time, in the same environment, with both recieving traffic. So
> all traffic would be routed to this version (multiple servers, but all
> running the same image).
>
> An alternative approach that I'm considering, is to switch to the old
> image, but manually update the VCL to the new version. If the problem
> remains, then the issue is almost certainly with the VLC. But if the
> problem disapears, then it's more likely something else.
>
>
> > what's the output of: varnishstat -1 -f '*g_bytes'
>
> SMA.default.g_bytes  10951750929  .   Bytes outstanding
> SMA.large.g_bytes 8587329728  .   Bytes outstanding
> SMA.Transient.g_bytes  3177920  .   Bytes outstanding
>
> So, the default storage usage has gone up with 2GB since my first message
> here, while the others have remained the same. Meanwhile, the total memory
> usage of Varnish has gone up to 26 GB, an increase of 3 GB. So now the
> overhead has gone up with 1GB to a total of 6 GB.
>
> Going forward, it will be interesting to see how the memory consumption
> changes after the default storage has reached its max (2 GB from where it
> is now). If we're lucky, it will stabilize, and then I'm not sure if it's
> worth it to troubleshoot any further. Otherwise, the free memory would get
> a bit too close to zero for our comfort, with no indication of stopping.
>
> Does Varnish keep track of total available OS memory, and start releasing
> memory by throwing out objects from the cache? Or will it continue to eat
> memory until something fails?
>
>
> > have you tweaked any workspaces/thread parameters?
>
> Nope. As I said, we haven't changed any OS or Varnish configuration.
> ___
> varnish-misc mailing list
> varnish-misc@varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Varnish suddenly started using much more memory

2024-05-20 Thread Batanun B
> Sorry, I should have been clearer, I meant: where are the varnish packages 
> coming from? Are they from the official repositories, from 
> https://packagecloud.io/varnishcache/ or built from source maybe?

Ah, I see. They come from varnishcache packagecloud. More specifically, we use:

https://packagecloud.io/install/repositories/varnishcache/varnish60lts/script.deb.sh


> you should really invest some time in something like prometheus, it would 
> probably have made the issue obvious

Yes, in hindsight we definitely should have done that. I will discuss this with 
my coworkers going forward.


> Is there any chance you can run the old version on the server to explore the 
> differences?

Possibly, for a limited time. If so, what types of tests would I do? And how 
long time would I need to run the old version?

Note that with our setup, we wouldn't be able to run two different images at 
the same time, in the same environment, with both recieving traffic. So all 
traffic would be routed to this version (multiple servers, but all running the 
same image).

An alternative approach that I'm considering, is to switch to the old image, 
but manually update the VCL to the new version. If the problem remains, then 
the issue is almost certainly with the VLC. But if the problem disapears, then 
it's more likely something else.


> what's the output of: varnishstat -1 -f '*g_bytes'

SMA.default.g_bytes  10951750929          .   Bytes outstanding
SMA.large.g_bytes     8587329728          .   Bytes outstanding
SMA.Transient.g_bytes      3177920          .   Bytes outstanding

So, the default storage usage has gone up with 2GB since my first message here, 
while the others have remained the same. Meanwhile, the total memory usage of 
Varnish has gone up to 26 GB, an increase of 3 GB. So now the overhead has gone 
up with 1GB to a total of 6 GB.

Going forward, it will be interesting to see how the memory consumption changes 
after the default storage has reached its max (2 GB from where it is now). If 
we're lucky, it will stabilize, and then I'm not sure if it's worth it to 
troubleshoot any further. Otherwise, the free memory would get a bit too close 
to zero for our comfort, with no indication of stopping. 

Does Varnish keep track of total available OS memory, and start releasing 
memory by throwing out objects from the cache? Or will it continue to eat 
memory until something fails?


> have you tweaked any workspaces/thread parameters?

Nope. As I said, we haven't changed any OS or Varnish configuration.
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Re: Varnish suddenly started using much more memory

2024-05-18 Thread Guillaume Quintard
Sorry, I should have been clearer, I meant: where are the varnish packages
coming from? Are they from the official repositories, from
https://packagecloud.io/varnishcache/ or built from source maybe?

If you don't have old metrics (you should really invest some time in
something like prometheus, it would probably have made the issue obvious),
then we can't really compare anything. Is there any chance you can run the
old version on the server to explore the differences?

Two extra questions:
- what's the output of: varnishstat -1 -f '*g_bytes'
- have you tweaked any workspaces/thread parameters?

Cheers,

On Fri, May 17, 2024, 06:17 Batanun B  wrote:

> Hi,
>
> Naturally, I can't be certain that the "in my mind" trivial VCL changes
> can't be the culprit. But I just can't see the logic in those changes
> causing this massive change in memory usage. But I'll summarize the changes
> here, and maybe you can identify a suspect:
>
> * Modified the xkey header used by the xkey vmod, adding the id of the
> current website
> * Modified the TTL, from 1w to 1h, for a specific type of resource
> existing in maybe 20 versions (ie different urls), each being about 5 kB in
> size
> * Modified the backend probe url, from the startpage (ie full html) to a
> dedicated healthcheck endpoint (much smaller footprint, and much quicker)
>
> That's it. That's all the VCL changes we made in that deployment. And,
> like I said, we did no changes in the OS or Varnish config.
>
>
> > check the difference in passes, if they are about the same, look for
> hit-for-misses,
>
> We don't have those statistics from the old server, so I can't do a
> comparison. But here are the current statistics:
>
> MAIN.s_pass   180721 0.14 Total pass-ed
> requests seen
> MAIN.cache_hitpass 0 0.00 Cache hits for pass.
> MAIN.cache_hit   3718468 2.86 Cache hits
> MAIN.cache_hit_grace   53903 0.04 Cache grace hits
> MAIN.cache_hitpass 0 0.00 Cache hits for pass.
> MAIN.cache_hitmiss  1129 0.00 Cache hits for miss.
>
>
> > and lastly, look at how long Varnish is trying to cache the average
> object.
>
> I'm not sure how I do that. Is there a varnishstat counter I can look at?
>
> > which packages are you using?
>
> Instead of giving you the full list, I guess it makes more sense to just
> list the one that differ. Below is a diff output of "apt list --installed"
> of before and after the deploy.
>
> Regards
>
>
> 2c2
> < accountsservice/now 0.6.55-0ubuntu12~20.04.5 amd64 [installed,upgradable
> to: 0.6.55-0ubuntu12~20.04.7]
> ---
> > accountsservice/focal-updates,focal-security,now
> 0.6.55-0ubuntu12~20.04.7 amd64 [installed,automatic]
> 23,25c23,25
> < bind9-dnsutils/now 1:9.16.1-0ubuntu2.12 amd64 [installed,upgradable to:
> 1:9.16.48-0ubuntu0.20.04.1]
> < bind9-host/now 1:9.16.1-0ubuntu2.12 amd64 [installed,upgradable to:
> 1:9.16.48-0ubuntu0.20.04.1]
> < bind9-libs/now 1:9.16.1-0ubuntu2.12 amd64 [installed,upgradable to:
> 1:9.16.48-0ubuntu0.20.04.1]
> ---
> > bind9-dnsutils/focal-updates,focal-security,now
> 1:9.16.48-0ubuntu0.20.04.1 amd64 [installed,automatic]
> > bind9-host/focal-updates,focal-security,now 1:9.16.48-0ubuntu0.20.04.1
> amd64 [installed,automatic]
> > bind9-libs/focal-updates,focal-security,now 1:9.16.48-0ubuntu0.20.04.1
> amd64 [installed,automatic]
> 31c31
> < bsdutils/focal-security,now 1:2.34-0.1ubuntu9.3 amd64
> [installed,upgradable to: 1:2.34-0.1ubuntu9.4]
> ---
> > bsdutils/now 1:2.34-0.1ubuntu9.3 amd64 [installed,upgradable to:
> 1:2.34-0.1ubuntu9.6]
> 41c41
> < cloud-init/now 22.4.2-0ubuntu0~20.04.2 all [installed,upgradable to:
> 23.4.4-0ubuntu0~20.04.1]
> ---
> > cloud-init/now 22.4.2-0ubuntu0~20.04.2 all [installed,upgradable to:
> 24.1.3-0ubuntu1~20.04.1]
> 48c48
> < cpio/focal-updates,focal-security,now 2.13+dfsg-2ubuntu0.3 amd64
> [installed,automatic]
> ---
> > cpio/now 2.13+dfsg-2ubuntu0.3 amd64 [installed,upgradable to:
> 2.13+dfsg-2ubuntu0.4]
> 56c56
> < curl/focal-updates,focal-security,now 7.68.0-1ubuntu2.21 amd64
> [installed]
> ---
> > curl/focal-updates,focal-security,now 7.68.0-1ubuntu2.22 amd64
> [installed]
> 68c68
> < distro-info-data/now 0.43ubuntu1.11 all [installed,upgradable to:
> 0.43ubuntu1.15]
> ---
> > distro-info-data/now 0.43ubuntu1.11 all [installed,upgradable to:
> 0.43ubuntu1.16]
> 82c82
> < fdisk/focal-security,now 2.34-0.1ubuntu9.3 amd64 [installed,upgradable
> to: 2.34-0.1ubuntu9.4]
> ---
> > fdisk/now 2.34-0.1ubuntu9.3 amd64 [installed,upgradable to:
> 2.34-0.1ubuntu9.6]
> 119,120c119,120
> < grub-efi-amd64-bin/now 2.06-2ubuntu14.1 amd64 [installed,upgradable to:
> 2.06-2ubuntu14.4]
> < grub-efi-amd64-signed/now 1.187.3~20.04.1+2.06-2ubuntu14.1 amd64
> [installed,upgradable to: 1.187.6~20.04.1+2.06-2ubuntu14.4]
> ---
> > grub-efi-amd64-bin/focal-updates,focal-security,now 2.06-2ubuntu14.4
> amd64 [installed]
> > 

Re: Varnish suddenly started using much more memory

2024-05-17 Thread Batanun B
Hi,

Naturally, I can't be certain that the "in my mind" trivial VCL changes can't 
be the culprit. But I just can't see the logic in those changes causing this 
massive change in memory usage. But I'll summarize the changes here, and maybe 
you can identify a suspect:

* Modified the xkey header used by the xkey vmod, adding the id of the current 
website
* Modified the TTL, from 1w to 1h, for a specific type of resource existing in 
maybe 20 versions (ie different urls), each being about 5 kB in size
* Modified the backend probe url, from the startpage (ie full html) to a 
dedicated healthcheck endpoint (much smaller footprint, and much quicker)

That's it. That's all the VCL changes we made in that deployment. And, like I 
said, we did no changes in the OS or Varnish config.


> check the difference in passes, if they are about the same, look for 
> hit-for-misses,

We don't have those statistics from the old server, so I can't do a comparison. 
But here are the current statistics:

MAIN.s_pass                       180721         0.14 Total pass-ed requests 
seen
MAIN.cache_hitpass                     0         0.00 Cache hits for pass.
MAIN.cache_hit                   3718468         2.86 Cache hits
MAIN.cache_hit_grace               53903         0.04 Cache grace hits
MAIN.cache_hitpass                     0         0.00 Cache hits for pass.
MAIN.cache_hitmiss                  1129         0.00 Cache hits for miss.


> and lastly, look at how long Varnish is trying to cache the average object.

I'm not sure how I do that. Is there a varnishstat counter I can look at?

> which packages are you using?

Instead of giving you the full list, I guess it makes more sense to just list 
the one that differ. Below is a diff output of "apt list --installed" of before 
and after the deploy.

Regards


2c2
< accountsservice/now 0.6.55-0ubuntu12~20.04.5 amd64 [installed,upgradable to: 
0.6.55-0ubuntu12~20.04.7]
---
> accountsservice/focal-updates,focal-security,now 0.6.55-0ubuntu12~20.04.7 
> amd64 [installed,automatic]
23,25c23,25
< bind9-dnsutils/now 1:9.16.1-0ubuntu2.12 amd64 [installed,upgradable to: 
1:9.16.48-0ubuntu0.20.04.1]
< bind9-host/now 1:9.16.1-0ubuntu2.12 amd64 [installed,upgradable to: 
1:9.16.48-0ubuntu0.20.04.1]
< bind9-libs/now 1:9.16.1-0ubuntu2.12 amd64 [installed,upgradable to: 
1:9.16.48-0ubuntu0.20.04.1]
---
> bind9-dnsutils/focal-updates,focal-security,now 1:9.16.48-0ubuntu0.20.04.1 
> amd64 [installed,automatic]
> bind9-host/focal-updates,focal-security,now 1:9.16.48-0ubuntu0.20.04.1 amd64 
> [installed,automatic]
> bind9-libs/focal-updates,focal-security,now 1:9.16.48-0ubuntu0.20.04.1 amd64 
> [installed,automatic]
31c31
< bsdutils/focal-security,now 1:2.34-0.1ubuntu9.3 amd64 [installed,upgradable 
to: 1:2.34-0.1ubuntu9.4]
---
> bsdutils/now 1:2.34-0.1ubuntu9.3 amd64 [installed,upgradable to: 
> 1:2.34-0.1ubuntu9.6]
41c41
< cloud-init/now 22.4.2-0ubuntu0~20.04.2 all [installed,upgradable to: 
23.4.4-0ubuntu0~20.04.1]
---
> cloud-init/now 22.4.2-0ubuntu0~20.04.2 all [installed,upgradable to: 
> 24.1.3-0ubuntu1~20.04.1]
48c48
< cpio/focal-updates,focal-security,now 2.13+dfsg-2ubuntu0.3 amd64 
[installed,automatic]
---
> cpio/now 2.13+dfsg-2ubuntu0.3 amd64 [installed,upgradable to: 
> 2.13+dfsg-2ubuntu0.4]
56c56
< curl/focal-updates,focal-security,now 7.68.0-1ubuntu2.21 amd64 [installed]
---
> curl/focal-updates,focal-security,now 7.68.0-1ubuntu2.22 amd64 [installed]
68c68
< distro-info-data/now 0.43ubuntu1.11 all [installed,upgradable to: 
0.43ubuntu1.15]
---
> distro-info-data/now 0.43ubuntu1.11 all [installed,upgradable to: 
> 0.43ubuntu1.16]
82c82
< fdisk/focal-security,now 2.34-0.1ubuntu9.3 amd64 [installed,upgradable to: 
2.34-0.1ubuntu9.4]
---
> fdisk/now 2.34-0.1ubuntu9.3 amd64 [installed,upgradable to: 2.34-0.1ubuntu9.6]
119,120c119,120
< grub-efi-amd64-bin/now 2.06-2ubuntu14.1 amd64 [installed,upgradable to: 
2.06-2ubuntu14.4]
< grub-efi-amd64-signed/now 1.187.3~20.04.1+2.06-2ubuntu14.1 amd64 
[installed,upgradable to: 1.187.6~20.04.1+2.06-2ubuntu14.4]
---
> grub-efi-amd64-bin/focal-updates,focal-security,now 2.06-2ubuntu14.4 amd64 
> [installed]
> grub-efi-amd64-signed/focal-updates,focal-security,now 
> 1.187.6~20.04.1+2.06-2ubuntu14.4 amd64 [installed]
148c148
< klibc-utils/focal-updates,focal-security,now 2.0.7-1ubuntu5.1 amd64 
[installed,automatic]
---
> klibc-utils/focal-updates,focal-security,now 2.0.7-1ubuntu5.2 amd64 
> [installed,automatic]
152c152
< landscape-common/focal-updates,now 19.12-0ubuntu4.3 amd64 
[installed,automatic]
---
> landscape-common/now 19.12-0ubuntu4.3 amd64 [installed,upgradable to: 
> 23.02-0ubuntu1~20.04.2]
154,155c154,155
< less/now 551-1ubuntu0.1 amd64 [installed,upgradable to: 551-1ubuntu0.2]
< libaccountsservice0/now 0.6.55-0ubuntu12~20.04.5 amd64 [installed,upgradable 
to: 0.6.55-0ubuntu12~20.04.7]
---
> less/now 551-1ubuntu0.1 amd64 [installed,upgradable to: 551-1ubuntu0.3]
> libaccountsservice0/focal-updates,focal-security,now 

Re: Varnish suddenly started using much more memory

2024-05-16 Thread Guillaume Quintard
Hi,

I feel like the answer is there, somewhere. You said that the deploy
changed something, but that it can't possibly be the deploy.

I'm going to bet that it's the deploy. Most likely you changed something
that messed up the willingness to cache, or your TTL.
First, check the difference in passes, if they are about the same, look for
hit-for-misses, and lastly, look at how long Varnish is trying to cache the
average object. I'm pretty one of those changed.

That being said, the memory shouldn't explode like that, which packages are
you using?

-- 
Guillaume Quintard


On Thu, May 16, 2024 at 2:19 AM Batanun B  wrote:

> Hi,
>
> About two weeks ago we deployed some minor changes to our Varnish servers
> in production, and after that we have noticed a big change in the memory
> that Varnish consumes.
>
> Before the deploy, the amount of available memory on the servers were very
> stable, around 25 GB, for months on end. After the deploy, the amount of
> available memory dropped below 25 GB within 6 hours, and is dropping about
> 1 GB more each day, with no indication that it will level out before
> hitting rock bottom.
>
> There was no change in traffic patterns during the time of the deploy. And
> we didn't change any OS or Varnish configuration. The deplow consisted only
> of trivial VCL changes, like changing the backend probe url to a dedicated
> healthcheck endpoint, and tweaking the ttl for a minor resource. Nothing of
> which could explain this massive change in memory usage.
>
> We have configured varnish with "-s default=malloc,12G -s
> large=malloc,8G", where the combined 20GB is about 60% of the total server
> RAM of 32GB. This is below the recommended 75% maximum I've seen in many
> places.
>
> Currently Varnish uses about 73% of the server memory, or 23GB (the RES
> column in htop). The default storage uses about 10 GB
> (SMA.default.g_bytes), while the large storage uses 8 GB. And the transient
> storage is currently about 2 MB (SMA.Transient.g_bytes). In total this
> results in about 18 GB. So what is that additional 5 GB used for? How can I
> troubleshoot that?
>
> And, more importantly, what could possibly explain this sudden change?
>
> The Ubuntu version stayed the same (20.04.5 LTS), and the Varnish version
> too (6.0.11-1~focal), as well as varnish-modules (0.15.1). I notice some
> differences in some installed packages of the servers, but nothing that
> stands out to me (but I'm no linux expert).
>
> Regards
> ___
> varnish-misc mailing list
> varnish-misc@varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


Varnish suddenly started using much more memory

2024-05-16 Thread Batanun B
Hi,

About two weeks ago we deployed some minor changes to our Varnish servers in 
production, and after that we have noticed a big change in the memory that 
Varnish consumes.

Before the deploy, the amount of available memory on the servers were very 
stable, around 25 GB, for months on end. After the deploy, the amount of 
available memory dropped below 25 GB within 6 hours, and is dropping about 1 GB 
more each day, with no indication that it will level out before hitting rock 
bottom.

There was no change in traffic patterns during the time of the deploy. And we 
didn't change any OS or Varnish configuration. The deplow consisted only of 
trivial VCL changes, like changing the backend probe url to a dedicated 
healthcheck endpoint, and tweaking the ttl for a minor resource. Nothing of 
which could explain this massive change in memory usage.

We have configured varnish with "-s default=malloc,12G -s large=malloc,8G", 
where the combined 20GB is about 60% of the total server RAM of 32GB. This is 
below the recommended 75% maximum I've seen in many places.

Currently Varnish uses about 73% of the server memory, or 23GB (the RES column 
in htop). The default storage uses about 10 GB (SMA.default.g_bytes), while the 
large storage uses 8 GB. And the transient storage is currently about 2 MB 
(SMA.Transient.g_bytes). In total this results in about 18 GB. So what is that 
additional 5 GB used for? How can I troubleshoot that?

And, more importantly, what could possibly explain this sudden change?

The Ubuntu version stayed the same (20.04.5 LTS), and the Varnish version too 
(6.0.11-1~focal), as well as varnish-modules (0.15.1). I notice some 
differences in some installed packages of the servers, but nothing that stands 
out to me (but I'm no linux expert).

Regards
___
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc