Re: [galaxy-dev] Docker and alpine linux

2016-02-28 Thread Björn Grüning


Am 23.02.2016 um 10:13 schrieb Marius van den Beek:
> Just my 2 cents about this:
> I feel your pain, our connection to the docker hub is horrible, it takes
> an hour to pull the GIE images ... .
> (While it only takes seconds from the cloud ...).

This is a Dockerhub related issue and we offer also images from:
https://quay.io/repository/bgruening/galaxy

This is much faster in downloading and building.

> The galaxy docker images are pretty fat, because we use them VM-style,
> in principle nothing wrong about that.
> So the base-image is about 1.2GB  

mh, the base image is 400 Mb at least this is what quay is telling me:

https://quay.io/repository/bgruening/galaxy?tab=tags

> add a few tools and you're quickly
> reaching 5GB.

Not if you use Docker or conda as dependency resolving mechanism. Indeed
I'm working on switching many flavours to use conda and Docker as
dependency resolvers, which postpones (and lowers) the download from
pulltime to runtime of the tools.

> If in addition you use interactive environments ... add a few GB more.

Only if you use them, at first run.

> I think that instead of trying to reduce the size of the base-image, it
> might be a better effort to separate the components
> into a proxy-image, database image (perhaps 2, one for tools / one for
> user-data), galaxy-image,
> cluster-image ... and so on. This would allow you to just update the
> tools and galaxy image regularly,
> plus you could do all the neat docker stuff, like versioning,
> committing, rolling updates, streaming database replication, worker
> scaling ...
> It's certainly something I would be interested in.

Here is the related issue:
https://github.com/bgruening/docker-galaxy-stable/issues/43

But going this route means to not be in sync with the ansible-roles used
by creating the VM's, Clound-Images ... but we have plans to base the
Cloudman ontop of Docker. If you can base VM's ontop of Docker we can
possibly use Docker compose if we decide it's worth the effort and
switch our other deployments to it. I would very much keep all
deployments in sync as we do currently.

> The other thing is to make sure that the tool dependencies are as slim
> as possible. Having many different R packages
> and their source lying around makes for a lot of data. Hopefully conda
> can alleviate that situation.

Yeah, it does!

Ciao,
Bjoern

> Cheers,
> Marius
> 
> On 23 February 2016 at 09:36, Björn Grüning  > wrote:
> 
> Hi Tiago,
> 
> thanks for the heads-up. I also tried alpine some time ago but Galaxy
> needs some external dependencies which Nate is building also with a ppa
> for Debian based systems. So this is not so easy to migrate.
> 
> What we could do is to orchestrate the containers and its deps:
> 
> https://github.com/bgruening/docker-galaxy-stable/issues/43
> 
> But this has the big disadvantages of not sharing the setup with other
> Galaxy installations, like the VM installation from planemo-machine.
> 
> So in the end I stoped to make it more modular and tried to share as
> much as possible with other installations of Galaxy and move more and
> more into the ansible-playbook.
> 
> Thanks Tiago for trying this,
> Bjoern
> 
> Am 23.02.2016 um 05:15 schrieb Tiago Antao:
> > Dear all,
> >
> > This email is mostly to report a negative result, maybe to help others
> > _not_ trying something.
> >
> > I researched the possibility of replacing ubuntu with alpine on the
> > Docker images (alpine seems to be used more and more on docker images,
> > with plenty of official containers now based on it and not on debian).
> >
> > The reason alpine is used, its because it generates very small
> > containers (a bare bones one is below 10 MB). But, due the the large
> > dependencies of galaxy, the gain is negligible. Maybe 200 MB or so. 20%
> > is something, but not a revolution.
> >
> > This being said, for servers with smaller dependencies (a mail server,
> > web, ldap, dns...) alpine really reduces the footprint of docker
> > containers.
> >
> > Tiago
> >
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   https://lists.galaxyproject.org/
> 
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
> 
> 
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Docker and alpine linux

2016-02-23 Thread Marius van den Beek
Just my 2 cents about this:
I feel your pain, our connection to the docker hub is horrible, it takes an
hour to pull the GIE images ... .
(While it only takes seconds from the cloud ...).
The galaxy docker images are pretty fat, because we use them VM-style, in
principle nothing wrong about that.
So the base-image is about 1.2GB  add a few tools and you're quickly
reaching 5GB.
If in addition you use interactive environments ... add a few GB more.

I think that instead of trying to reduce the size of the base-image, it
might be a better effort to separate the components
into a proxy-image, database image (perhaps 2, one for tools / one for
user-data), galaxy-image,
cluster-image ... and so on. This would allow you to just update the tools
and galaxy image regularly,
plus you could do all the neat docker stuff, like versioning, committing,
rolling updates, streaming database replication, worker scaling ...
It's certainly something I would be interested in.
The other thing is to make sure that the tool dependencies are as slim as
possible. Having many different R packages
and their source lying around makes for a lot of data. Hopefully conda can
alleviate that situation.

Cheers,
Marius

On 23 February 2016 at 09:36, Björn Grüning 
wrote:

> Hi Tiago,
>
> thanks for the heads-up. I also tried alpine some time ago but Galaxy
> needs some external dependencies which Nate is building also with a ppa
> for Debian based systems. So this is not so easy to migrate.
>
> What we could do is to orchestrate the containers and its deps:
>
> https://github.com/bgruening/docker-galaxy-stable/issues/43
>
> But this has the big disadvantages of not sharing the setup with other
> Galaxy installations, like the VM installation from planemo-machine.
>
> So in the end I stoped to make it more modular and tried to share as
> much as possible with other installations of Galaxy and move more and
> more into the ansible-playbook.
>
> Thanks Tiago for trying this,
> Bjoern
>
> Am 23.02.2016 um 05:15 schrieb Tiago Antao:
> > Dear all,
> >
> > This email is mostly to report a negative result, maybe to help others
> > _not_ trying something.
> >
> > I researched the possibility of replacing ubuntu with alpine on the
> > Docker images (alpine seems to be used more and more on docker images,
> > with plenty of official containers now based on it and not on debian).
> >
> > The reason alpine is used, its because it generates very small
> > containers (a bare bones one is below 10 MB). But, due the the large
> > dependencies of galaxy, the gain is negligible. Maybe 200 MB or so. 20%
> > is something, but not a revolution.
> >
> > This being said, for servers with smaller dependencies (a mail server,
> > web, ldap, dns...) alpine really reduces the footprint of docker
> > containers.
> >
> > Tiago
> >
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Docker and alpine linux

2016-02-23 Thread Björn Grüning
Hi Tiago,

thanks for the heads-up. I also tried alpine some time ago but Galaxy
needs some external dependencies which Nate is building also with a ppa
for Debian based systems. So this is not so easy to migrate.

What we could do is to orchestrate the containers and its deps:

https://github.com/bgruening/docker-galaxy-stable/issues/43

But this has the big disadvantages of not sharing the setup with other
Galaxy installations, like the VM installation from planemo-machine.

So in the end I stoped to make it more modular and tried to share as
much as possible with other installations of Galaxy and move more and
more into the ansible-playbook.

Thanks Tiago for trying this,
Bjoern

Am 23.02.2016 um 05:15 schrieb Tiago Antao:
> Dear all,
> 
> This email is mostly to report a negative result, maybe to help others
> _not_ trying something.
> 
> I researched the possibility of replacing ubuntu with alpine on the
> Docker images (alpine seems to be used more and more on docker images,
> with plenty of official containers now based on it and not on debian).
> 
> The reason alpine is used, its because it generates very small
> containers (a bare bones one is below 10 MB). But, due the the large
> dependencies of galaxy, the gain is negligible. Maybe 200 MB or so. 20%
> is something, but not a revolution.
> 
> This being said, for servers with smaller dependencies (a mail server,
> web, ldap, dns...) alpine really reduces the footprint of docker
> containers.
> 
> Tiago
> 
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] Docker and alpine linux

2016-02-22 Thread Tiago Antao
Dear all,

This email is mostly to report a negative result, maybe to help others
_not_ trying something.

I researched the possibility of replacing ubuntu with alpine on the
Docker images (alpine seems to be used more and more on docker images,
with plenty of official containers now based on it and not on debian).

The reason alpine is used, its because it generates very small
containers (a bare bones one is below 10 MB). But, due the the large
dependencies of galaxy, the gain is negligible. Maybe 200 MB or so. 20%
is something, but not a revolution.

This being said, for servers with smaller dependencies (a mail server,
web, ldap, dns...) alpine really reduces the footprint of docker
containers.

Tiago

-- 
"While I may be sending this email outside my normal office hours, I
have no expectation to receive a reply outside yours" - @tomstafford
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/