[Savonet-users] AWS S3, large collection of large files.

Domovoy Thu, 28 Nov 2019 08:16:01 -0800

Hi, here is the situation:

We have about 15000 files in our collection, most of them are music,
some (about 5000) are shows sent to us by our contributors, those can
be quite big, their duration ranges from 5 minutes to an hour.


Our schedule is made of a switch of shows:
switch([({time_period}, show), ...])

Shows are usually of the form:
sequence([single(jingle), single(episode), playlist(filler)])

This setup worked quite well for the last 3 years, but now i'm
migrating all our services to another VPS provider.
Those VPS have limited disk space, but the provider also propose S3
storage. So i put all our collections in S3 and wanted to start using
that.

First solution: s3fs
It works, no changes to my LS scripts. BUT: CPU usage is very high
while getting the files, even with cache.

Second solution: LS aws protocol
It uses aws, and it seems it can only get one file per request, not a
whole object hierarchy

Third solution: my own S3 protocol
It uses s3cmd, get objects recursively, sends the request to
http|https if the object is public (this last feature uses `s3cmd info
uri`, and the returned field 'URL', not sure if it is available on any
S3, maybe it's just my provider).

Now, here is the problem:
Our schedule may not use the whole collection, (we have some archived
shows) but it sure does use a big part of it, some playlists are quite
large (the night show is basically a playlist on the whole music
directory).
From what i understand, request resolution happens as soon as LS
starts, so my scripts end up downloading... the whole collection!
I believe this defeat the purpose of having files on an external
service, be it http, S3, whatever.
And in any case, i simply can't have that, i don't have the disk space
on my VPS.

I tried, using my own s3 protocol, to use input.http and input.https
for public files, in hope that would solve the "download all" problem
(instead it would become a "stream all" problem), but i get a LOT of
'buffer overrun' and the sound gets choppy, increasing the buffer size
only delay the problem on big files (like many shows which last one
hour), and dramatically increase CPU/RAM usage.

Ok, you're still reading, thank you.
Here is the big question now:
I understand that LS need a local copy of the file to work on, but is
there any way that i make it get the file only when it need to play
it, and then remove it?
Before that it would just resolve a _list_ of the available files (using
s3/http/https i can make sure the file _does_ exists without actually
downloading it), and build its activations based on that.
Or is liquidsoap just not the right tool for the job, and so i should
just use (request|source).dynamic and use an external service to handle
my schedule/request resolution?

Thank you for reading

pgpW9uu8KB2Eg.pgp
Description: OpenPGP digital signature

_______________________________________________
Savonet-users mailing list
Savonet-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/savonet-users

[Savonet-users] AWS S3, large collection of large files.

Reply via email to