Re: CI and PRs

2019-08-16 Thread Pedro Larroy
Hi Aaron. This is difficult to diagnose, because I don't know what to do
when the hash of the layer in docker doesn't match and decides to rebuild
it. the r script seems not to have changed. I have observed this in the
past and I think is due to bugs in docker.   Maybe Kellen is able to give
some tips here.

In this case you should use -R which is already in master. (you can always
copy the script on top if you are in an older revision).

Another thing that worked for me in the past was to completely nuke the
docker cache, so it redonwloads from the CI repo. After that it worked fine
in some cases.

These two workarounds are not ideal, but should unblock you.

Pedro.

On Fri, Aug 16, 2019 at 11:39 AM Aaron Markham 
wrote:

> Is -R already in there?
>
> Here's an example of it happening to me right now I am making
> minor changes to the runtime_functions logic for handling the R docs
> output. I pull the fix, then run the container, but I see the R deps
> layer re-running. I didn't touch that. Why it that running again?
>
> From https://github.com/aaronmarkham/incubator-mxnet
>f71cc6d..deec6aa  new_website_pipeline_2_aaron_rdocs ->
> origin/new_website_pipeline_2_aaron_rdocs
> Updating f71cc6d..deec6aa
> Fast-forward
>  ci/docker/runtime_functions.sh | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> (base) ubuntu@ip-172-31-47-182:~/aaron/ci$ ./build.py
> --docker-registry mxnetci --platform ubuntu_cpu_r
> --docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh
> build_r_docs
> build.py: 2019-08-16 18:34:44,639Z INFO MXNet container based build tool.
> build.py: 2019-08-16 18:34:44,641Z INFO Docker cache download is
> enabled from registry mxnetci
> build.py: 2019-08-16 18:34:44,641Z INFO Loading Docker cache for
> mxnetci/build.ubuntu_cpu_r from mxnetci
> Using default tag: latest
> latest: Pulling from mxnetci/build.ubuntu_cpu_r
> Digest:
> sha256:7dc515c288b3e66d96920eb8975f985a501bb57f70595fbe0cb1c4fcd8d4184b
> Status: Downloaded newer image for mxnetci/build.ubuntu_cpu_r:latest
> build.py: 2019-08-16 18:34:44,807Z INFO Successfully pulled docker cache
> build.py: 2019-08-16 18:34:44,807Z INFO Building docker container
> tagged 'mxnetci/build.ubuntu_cpu_r' with docker
> build.py: 2019-08-16 18:34:44,807Z INFO Running command: 'docker build
> -f docker/Dockerfile.build.ubuntu_cpu_r --build-arg USER_ID=1000
> --build-arg GROUP_ID=1000 --cache-from mxnetci/build.ubuntu_cpu_r -t
> mxnetci/build.ubuntu_cpu_r docker'
> Sending build context to Docker daemon  289.8kB
> Step 1/15 : FROM ubuntu:16.04
>  ---> 5e13f8dd4c1a
> Step 2/15 : WORKDIR /work/deps
>  ---> Using cache
>  ---> afc2a135945d
> Step 3/15 : COPY install/ubuntu_core.sh /work/
>  ---> Using cache
>  ---> da2b2e7f35e1
> Step 4/15 : RUN /work/ubuntu_core.sh
>  ---> Using cache
>  ---> d1e88b26b1d2
> Step 5/15 : COPY install/deb_ubuntu_ccache.sh /work/
>  ---> Using cache
>  ---> 3aa97dea3b7b
> Step 6/15 : RUN /work/deb_ubuntu_ccache.sh
>  ---> Using cache
>  ---> bec503f1d149
> Step 7/15 : COPY install/ubuntu_r.sh /work/
>  ---> c5e77c38031d
> Step 8/15 : COPY install/r.gpg /work/
>  ---> d8cdbf015d2b
> Step 9/15 : RUN /work/ubuntu_r.sh
>  ---> Running in c6c90b9e1538
> ++ dirname /work/ubuntu_r.sh
> + cd /work
> + echo 'deb http://cran.rstudio.com/bin/linux/ubuntu trusty/'
> + apt-key add r.gpg
> OK
> + add-apt-repository 'deb [arch=amd64,i386]
> https://cran.rstudio.com/bin/linux/ubuntu xenial/'
> + apt-get update
> Ign:1 http://cran.rstudio.com/bin/linux/ubuntu trusty/ InRelease
>
> On Fri, Aug 16, 2019 at 11:32 AM Pedro Larroy
>  wrote:
> >
> > Also, I forgot, another workaround is that I added the -R flag to the
> build
> > logic (build.py) so the container is not rebuilt for manual use.
> >
> > On Fri, Aug 16, 2019 at 11:18 AM Pedro Larroy <
> pedro.larroy.li...@gmail.com>
> > wrote:
> >
> > >
> > > Hi Aaron.
> > >
> > > As Marco explained, if you are in master the cache usually works,
> there's
> > > two issues that I have observed:
> > >
> > > 1 - Docker doesn't automatically pull the base image (ex.
> ubuntu:16.04) so
> > > if your cached base which is used in the FROM statement becomes
> outdated
> > > your caching won't work. (Using docker pull ubuntu:16.04) or the base
> > > images from the container helps with this.
> > >
> > > 2 - There's another situation where the above doesn't help which seems
> to
> > > be an unidentified issue with the docker cache:
> > > https://github.com/docker/docker.github.io/issues/8886
> > >
> > > We can get a short term workaround for #1 by explicitly pulling bases
> from
> > > the script, but I think docker should do it when using --cache-from so
> > > maybe contributing a patch to docker would the best approach.
> > >
> > > Pedro
> > >
> > > On Thu, Aug 15, 2019 at 7:06 PM Aaron Markham <
> aaron.s.mark...@gmail.com>
> > > wrote:
> > >
> > >> When you create a new Dockerfile and use that on CI, it doesn't seem
> > >> to cache some of the steps... like this:
> > >>
> > >> 

Re: CI and PRs

2019-08-16 Thread Aaron Markham
Is -R already in there?

Here's an example of it happening to me right now I am making
minor changes to the runtime_functions logic for handling the R docs
output. I pull the fix, then run the container, but I see the R deps
layer re-running. I didn't touch that. Why it that running again?

>From https://github.com/aaronmarkham/incubator-mxnet
   f71cc6d..deec6aa  new_website_pipeline_2_aaron_rdocs ->
origin/new_website_pipeline_2_aaron_rdocs
Updating f71cc6d..deec6aa
Fast-forward
 ci/docker/runtime_functions.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
(base) ubuntu@ip-172-31-47-182:~/aaron/ci$ ./build.py
--docker-registry mxnetci --platform ubuntu_cpu_r
--docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh
build_r_docs
build.py: 2019-08-16 18:34:44,639Z INFO MXNet container based build tool.
build.py: 2019-08-16 18:34:44,641Z INFO Docker cache download is
enabled from registry mxnetci
build.py: 2019-08-16 18:34:44,641Z INFO Loading Docker cache for
mxnetci/build.ubuntu_cpu_r from mxnetci
Using default tag: latest
latest: Pulling from mxnetci/build.ubuntu_cpu_r
Digest: sha256:7dc515c288b3e66d96920eb8975f985a501bb57f70595fbe0cb1c4fcd8d4184b
Status: Downloaded newer image for mxnetci/build.ubuntu_cpu_r:latest
build.py: 2019-08-16 18:34:44,807Z INFO Successfully pulled docker cache
build.py: 2019-08-16 18:34:44,807Z INFO Building docker container
tagged 'mxnetci/build.ubuntu_cpu_r' with docker
build.py: 2019-08-16 18:34:44,807Z INFO Running command: 'docker build
-f docker/Dockerfile.build.ubuntu_cpu_r --build-arg USER_ID=1000
--build-arg GROUP_ID=1000 --cache-from mxnetci/build.ubuntu_cpu_r -t
mxnetci/build.ubuntu_cpu_r docker'
Sending build context to Docker daemon  289.8kB
Step 1/15 : FROM ubuntu:16.04
 ---> 5e13f8dd4c1a
Step 2/15 : WORKDIR /work/deps
 ---> Using cache
 ---> afc2a135945d
Step 3/15 : COPY install/ubuntu_core.sh /work/
 ---> Using cache
 ---> da2b2e7f35e1
Step 4/15 : RUN /work/ubuntu_core.sh
 ---> Using cache
 ---> d1e88b26b1d2
Step 5/15 : COPY install/deb_ubuntu_ccache.sh /work/
 ---> Using cache
 ---> 3aa97dea3b7b
Step 6/15 : RUN /work/deb_ubuntu_ccache.sh
 ---> Using cache
 ---> bec503f1d149
Step 7/15 : COPY install/ubuntu_r.sh /work/
 ---> c5e77c38031d
Step 8/15 : COPY install/r.gpg /work/
 ---> d8cdbf015d2b
Step 9/15 : RUN /work/ubuntu_r.sh
 ---> Running in c6c90b9e1538
++ dirname /work/ubuntu_r.sh
+ cd /work
+ echo 'deb http://cran.rstudio.com/bin/linux/ubuntu trusty/'
+ apt-key add r.gpg
OK
+ add-apt-repository 'deb [arch=amd64,i386]
https://cran.rstudio.com/bin/linux/ubuntu xenial/'
+ apt-get update
Ign:1 http://cran.rstudio.com/bin/linux/ubuntu trusty/ InRelease

On Fri, Aug 16, 2019 at 11:32 AM Pedro Larroy
 wrote:
>
> Also, I forgot, another workaround is that I added the -R flag to the build
> logic (build.py) so the container is not rebuilt for manual use.
>
> On Fri, Aug 16, 2019 at 11:18 AM Pedro Larroy 
> wrote:
>
> >
> > Hi Aaron.
> >
> > As Marco explained, if you are in master the cache usually works, there's
> > two issues that I have observed:
> >
> > 1 - Docker doesn't automatically pull the base image (ex. ubuntu:16.04) so
> > if your cached base which is used in the FROM statement becomes outdated
> > your caching won't work. (Using docker pull ubuntu:16.04) or the base
> > images from the container helps with this.
> >
> > 2 - There's another situation where the above doesn't help which seems to
> > be an unidentified issue with the docker cache:
> > https://github.com/docker/docker.github.io/issues/8886
> >
> > We can get a short term workaround for #1 by explicitly pulling bases from
> > the script, but I think docker should do it when using --cache-from so
> > maybe contributing a patch to docker would the best approach.
> >
> > Pedro
> >
> > On Thu, Aug 15, 2019 at 7:06 PM Aaron Markham 
> > wrote:
> >
> >> When you create a new Dockerfile and use that on CI, it doesn't seem
> >> to cache some of the steps... like this:
> >>
> >> Step 13/15 : RUN /work/ubuntu_docs.sh
> >>  ---> Running in a1e522f3283b
> >>  [91m+ echo 'Installing dependencies...'
> >> + apt-get update
> >>  [0mInstalling dependencies.
> >>
> >> Or this
> >>
> >> Step 4/13 : RUN /work/ubuntu_core.sh
> >>  ---> Running in e7882d7aa750
> >>  [91m+ apt-get update
> >>
> >> I get if I was changing those scripts, but then I'd think it should
> >> cache after running it once... but, no.
> >>
> >>
> >> On Thu, Aug 15, 2019 at 3:51 PM Marco de Abreu 
> >> wrote:
> >> >
> >> > Do I understand it correctly that you are saying that the Docker cache
> >> > doesn't work properly and regularly reinstalls dependencies? Or do you
> >> mean
> >> > that you only have cache misses when you modify the dependencies - which
> >> > would be expected?
> >> >
> >> > -Marco
> >> >
> >> > On Fri, Aug 16, 2019 at 12:48 AM Aaron Markham <
> >> aaron.s.mark...@gmail.com>
> >> > wrote:
> >> >
> >> > > Many of the CI pipelines follow this pattern:
> >> > > Load ubuntu 16.04, install 

Re: CI and PRs

2019-08-16 Thread Pedro Larroy
Also, I forgot, another workaround is that I added the -R flag to the build
logic (build.py) so the container is not rebuilt for manual use.

On Fri, Aug 16, 2019 at 11:18 AM Pedro Larroy 
wrote:

>
> Hi Aaron.
>
> As Marco explained, if you are in master the cache usually works, there's
> two issues that I have observed:
>
> 1 - Docker doesn't automatically pull the base image (ex. ubuntu:16.04) so
> if your cached base which is used in the FROM statement becomes outdated
> your caching won't work. (Using docker pull ubuntu:16.04) or the base
> images from the container helps with this.
>
> 2 - There's another situation where the above doesn't help which seems to
> be an unidentified issue with the docker cache:
> https://github.com/docker/docker.github.io/issues/8886
>
> We can get a short term workaround for #1 by explicitly pulling bases from
> the script, but I think docker should do it when using --cache-from so
> maybe contributing a patch to docker would the best approach.
>
> Pedro
>
> On Thu, Aug 15, 2019 at 7:06 PM Aaron Markham 
> wrote:
>
>> When you create a new Dockerfile and use that on CI, it doesn't seem
>> to cache some of the steps... like this:
>>
>> Step 13/15 : RUN /work/ubuntu_docs.sh
>>  ---> Running in a1e522f3283b
>>  [91m+ echo 'Installing dependencies...'
>> + apt-get update
>>  [0mInstalling dependencies.
>>
>> Or this
>>
>> Step 4/13 : RUN /work/ubuntu_core.sh
>>  ---> Running in e7882d7aa750
>>  [91m+ apt-get update
>>
>> I get if I was changing those scripts, but then I'd think it should
>> cache after running it once... but, no.
>>
>>
>> On Thu, Aug 15, 2019 at 3:51 PM Marco de Abreu 
>> wrote:
>> >
>> > Do I understand it correctly that you are saying that the Docker cache
>> > doesn't work properly and regularly reinstalls dependencies? Or do you
>> mean
>> > that you only have cache misses when you modify the dependencies - which
>> > would be expected?
>> >
>> > -Marco
>> >
>> > On Fri, Aug 16, 2019 at 12:48 AM Aaron Markham <
>> aaron.s.mark...@gmail.com>
>> > wrote:
>> >
>> > > Many of the CI pipelines follow this pattern:
>> > > Load ubuntu 16.04, install deps, build mxnet, then run some tests. Why
>> > > repeat steps 1-3 over and over?
>> > >
>> > > Now, some tests use a stashed binary and docker cache. And I see this
>> work
>> > > locally, but for the most part, on CI, you're gonna sit through a
>> > > dependency install.
>> > >
>> > > I noticed that almost all jobs use an ubuntu setup that is fully
>> loaded.
>> > > Without cache, it can take 10 or more minutes to build.  So I made a
>> lite
>> > > version. Takes only a few minutes instead.
>> > >
>> > > In some cases archiving worked great to share across pipelines, but as
>> > > Marco mentioned we need a storage solution to make that happen. We
>> can't
>> > > archive every intermediate artifact for each PR.
>> > >
>> > > On Thu, Aug 15, 2019, 13:47 Pedro Larroy <
>> pedro.larroy.li...@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi Aaron. Why speeds things up? What's the difference?
>> > > >
>> > > > Pedro.
>> > > >
>> > > > On Wed, Aug 14, 2019 at 8:39 PM Aaron Markham <
>> aaron.s.mark...@gmail.com
>> > > >
>> > > > wrote:
>> > > >
>> > > > > The PRs Thomas and I are working on for the new docs and website
>> share
>> > > > the
>> > > > > mxnet binary in the new CI pipelines we made. Speeds things up a
>> lot.
>> > > > >
>> > > > > On Wed, Aug 14, 2019, 18:16 Chris Olivier 
>> > > wrote:
>> > > > >
>> > > > > > I see it done daily now, and while I can’t share all the
>> details,
>> > > it’s
>> > > > > not
>> > > > > > an incredibly complex thing, and involves not much more than
>> nfs/efs
>> > > > > > sharing and remote ssh commands.  All it takes is a little
>> ingenuity
>> > > > and
>> > > > > > some imagination.
>> > > > > >
>> > > > > > On Wed, Aug 14, 2019 at 4:31 PM Pedro Larroy <
>> > > > > pedro.larroy.li...@gmail.com
>> > > > > > >
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Sounds good in theory. I think there are complex details with
>> > > regards
>> > > > > of
>> > > > > > > resource sharing during parallel execution. Still I think
>> both ways
>> > > > can
>> > > > > > be
>> > > > > > > explored. I think some tests run for unreasonably long times
>> for
>> > > what
>> > > > > > they
>> > > > > > > are doing. We already scale parts of the pipeline horizontally
>> > > across
>> > > > > > > workers.
>> > > > > > >
>> > > > > > >
>> > > > > > > On Wed, Aug 14, 2019 at 5:12 PM Chris Olivier <
>> > > > cjolivie...@apache.org>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > +1
>> > > > > > > >
>> > > > > > > > Rather than remove tests (which doesn’t scale as a
>> solution), why
>> > > > not
>> > > > > > > scale
>> > > > > > > > them horizontally so that they finish more quickly? Across
>> > > > processes
>> > > > > or
>> > > > > > > > even on a pool of machines that aren’t necessarily the build
>> > > > machine?
>> > > > > > > >
>> > > > > > > > On Wed, Aug 14, 2019 at 12:03 PM Marco de 

Re: CI and PRs

2019-08-16 Thread Pedro Larroy
Hi Aaron.

As Marco explained, if you are in master the cache usually works, there's
two issues that I have observed:

1 - Docker doesn't automatically pull the base image (ex. ubuntu:16.04) so
if your cached base which is used in the FROM statement becomes outdated
your caching won't work. (Using docker pull ubuntu:16.04) or the base
images from the container helps with this.

2 - There's another situation where the above doesn't help which seems to
be an unidentified issue with the docker cache:
https://github.com/docker/docker.github.io/issues/8886

We can get a short term workaround for #1 by explicitly pulling bases from
the script, but I think docker should do it when using --cache-from so
maybe contributing a patch to docker would the best approach.

Pedro

On Thu, Aug 15, 2019 at 7:06 PM Aaron Markham 
wrote:

> When you create a new Dockerfile and use that on CI, it doesn't seem
> to cache some of the steps... like this:
>
> Step 13/15 : RUN /work/ubuntu_docs.sh
>  ---> Running in a1e522f3283b
>  [91m+ echo 'Installing dependencies...'
> + apt-get update
>  [0mInstalling dependencies.
>
> Or this
>
> Step 4/13 : RUN /work/ubuntu_core.sh
>  ---> Running in e7882d7aa750
>  [91m+ apt-get update
>
> I get if I was changing those scripts, but then I'd think it should
> cache after running it once... but, no.
>
>
> On Thu, Aug 15, 2019 at 3:51 PM Marco de Abreu 
> wrote:
> >
> > Do I understand it correctly that you are saying that the Docker cache
> > doesn't work properly and regularly reinstalls dependencies? Or do you
> mean
> > that you only have cache misses when you modify the dependencies - which
> > would be expected?
> >
> > -Marco
> >
> > On Fri, Aug 16, 2019 at 12:48 AM Aaron Markham <
> aaron.s.mark...@gmail.com>
> > wrote:
> >
> > > Many of the CI pipelines follow this pattern:
> > > Load ubuntu 16.04, install deps, build mxnet, then run some tests. Why
> > > repeat steps 1-3 over and over?
> > >
> > > Now, some tests use a stashed binary and docker cache. And I see this
> work
> > > locally, but for the most part, on CI, you're gonna sit through a
> > > dependency install.
> > >
> > > I noticed that almost all jobs use an ubuntu setup that is fully
> loaded.
> > > Without cache, it can take 10 or more minutes to build.  So I made a
> lite
> > > version. Takes only a few minutes instead.
> > >
> > > In some cases archiving worked great to share across pipelines, but as
> > > Marco mentioned we need a storage solution to make that happen. We
> can't
> > > archive every intermediate artifact for each PR.
> > >
> > > On Thu, Aug 15, 2019, 13:47 Pedro Larroy  >
> > > wrote:
> > >
> > > > Hi Aaron. Why speeds things up? What's the difference?
> > > >
> > > > Pedro.
> > > >
> > > > On Wed, Aug 14, 2019 at 8:39 PM Aaron Markham <
> aaron.s.mark...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > The PRs Thomas and I are working on for the new docs and website
> share
> > > > the
> > > > > mxnet binary in the new CI pipelines we made. Speeds things up a
> lot.
> > > > >
> > > > > On Wed, Aug 14, 2019, 18:16 Chris Olivier 
> > > wrote:
> > > > >
> > > > > > I see it done daily now, and while I can’t share all the details,
> > > it’s
> > > > > not
> > > > > > an incredibly complex thing, and involves not much more than
> nfs/efs
> > > > > > sharing and remote ssh commands.  All it takes is a little
> ingenuity
> > > > and
> > > > > > some imagination.
> > > > > >
> > > > > > On Wed, Aug 14, 2019 at 4:31 PM Pedro Larroy <
> > > > > pedro.larroy.li...@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Sounds good in theory. I think there are complex details with
> > > regards
> > > > > of
> > > > > > > resource sharing during parallel execution. Still I think both
> ways
> > > > can
> > > > > > be
> > > > > > > explored. I think some tests run for unreasonably long times
> for
> > > what
> > > > > > they
> > > > > > > are doing. We already scale parts of the pipeline horizontally
> > > across
> > > > > > > workers.
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Aug 14, 2019 at 5:12 PM Chris Olivier <
> > > > cjolivie...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > > > Rather than remove tests (which doesn’t scale as a
> solution), why
> > > > not
> > > > > > > scale
> > > > > > > > them horizontally so that they finish more quickly? Across
> > > > processes
> > > > > or
> > > > > > > > even on a pool of machines that aren’t necessarily the build
> > > > machine?
> > > > > > > >
> > > > > > > > On Wed, Aug 14, 2019 at 12:03 PM Marco de Abreu <
> > > > > > marco.g.ab...@gmail.com
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > With regards to time I rather prefer us spending a bit more
> > > time
> > > > on
> > > > > > > > > maintenance than somebody running into an error that
> could've
> > > > been
> > > > > > > caught
> > > > > > > > > with a test.
> > > > > > > > >
> > > > > > > > > I mean, our 

Re: MXNet CI repository

2019-08-16 Thread Aaron Markham
One use case I'm thinking of here would cover the generation of docs
for different versions. The problem I'm running up against right now,
is that in the pending PR to update the docs workflows, I have all of
this new logic and processes, but I can't run it against an old
branch, because once I switch branches, all of that logic is gone. In
the past, I got around this by checking out a branch in a separate
folder and then copying in the updated files from master that would
control docs generation. I'm reluctant to do that again because it's
so complicated.

Could we use this separate repo as a way to orchestrate the version
builds? To checkout branches and then run docker images and
runtime_functions against that branch?

On Thu, Aug 15, 2019 at 1:49 PM Pedro Larroy
 wrote:
>
> Nice.
>
> On Thu, Aug 15, 2019 at 12:47 PM Marco de Abreu 
> wrote:
>
> > Repository has been created: https://github.com/apache/incubator-mxnet-ci
> >
> > I will fill it soon.
> >
> > -Marco
> >
> > On Thu, Aug 15, 2019 at 8:43 PM Carin Meier  wrote:
> >
> > > +1
> > >
> > > On Thu, Aug 15, 2019 at 2:37 PM Chaitanya Bapat 
> > > wrote:
> > >
> > > > +1
> > > > LGTM!
> > > >
> > > > On Thu, 15 Aug 2019 at 11:01, Marco de Abreu 
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I'd like to propose a repository where CI infrastructure code can be
> > > > > stored. I'd propose "incubator-mxnet-ci". Is everybody fine with that
> > > > name
> > > > > or has a better idea?
> > > > >
> > > > > Best regards
> > > > > Marco
> > > > >
> > > >
> > > >
> > > > --
> > > > *Chaitanya Prakash Bapat*
> > > > *+1 (973) 953-6299*
> > > >
> > > > [image: https://www.linkedin.com//in/chaibapat25]
> > > > [image:
> > > https://www.facebook.com/chaibapat
> > > > ]
> > > > [image:
> > > > https://twitter.com/ChaiBapchya]  > > >[image:
> > > > https://www.linkedin.com//in/chaibapat25]
> > > > 
> > > >
> > >
> >


Re: [apache/incubator-mxnet] [RFC] A faster version of Gamma sampling on GPU. (#15928)

2019-08-16 Thread Przemyslaw Tredak
Hi @xidulu. I did not look at the differences in the implementation of 
host-side vs device-side API for RNG in MXNet, but if they are comparable in 
terms of performance, a possible better approach would be something like this:
 - launch only as many blocks and threads as necessary to fill the GPU, each 
having their own RNG
 - use following pseudocode
```
while(my_sample_id < N_samples) {
  float rng = generate_next_rng();
  bool accepted = ... // compute whether this rng value is accepted
  if (accepted) {
// write the result
my_sample_id = next_sample();
  }
}
```
There are 2 ways of implementing `next_sample` here - either by `atomicInc` on 
some global counter or just by adding the total number of threads (so every 
thread processes the same number of samples). The atomic approach is 
potentially faster (as with the static assignment you could end up hitting a 
corner case where 1 thread would still do a lot more work than the other 
threads), but is nondeterministic, so I think static assignment is preferable 
here.

-- 
You are receiving this because you are on a team that was mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/15928#issuecomment-522055756

Re: [apache/incubator-mxnet] [RFC] A faster version of Gamma sampling on GPU. (#15928)

2019-08-16 Thread Xi Wang
> cc @apache/mxnet-committers I think we can gradually refactor current 
> implementation (ndarray api) by adopting this new approach.
> 
> @xidulu could you please fix the url links in your post.

Links fixed.

-- 
You are receiving this because you are on a team that was mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/15928#issuecomment-521933621

Re: [apache/incubator-mxnet] [RFC] A faster version of Gamma sampling on GPU. (#15928)

2019-08-16 Thread Yizhi Liu
cc @apache/mxnet-committers I think we can gradually refactor current 
implementation (ndarray api) by adopting this new approach.

@xidulu could you please fix the url links in your post.

-- 
You are receiving this because you are on a team that was mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/15928#issuecomment-521931559

[apache/incubator-mxnet] [RFC] A faster version of Gamma sampling on GPU. (#15928)

2019-08-16 Thread Xi Wang
### Description

Sampling from Gamma distribution requires rejection sampling, which is applied 
in the implementation of `mxnet.ndarray.random.gamma()`.  However, two main 
drawbacks exist in the current implementation ( 
[https://github.com/apache/incubator-mxnet/blob/fb4f9d55382538fe688638b741830d84ae0d783e/src/operator/random/sampler.h#L182]()
 )

1. Random numbers used in the rejection sampling ( N(0,1) and U(0,1) ) are 
generated inside the kernel using CUDA device api. Also, although every batch 
of threads has its own RNG, samples are actually generated in serial inside 
each batch of threads.

2. Rejection sampling is achieved by using an infinite while loop inside the 
kernel, which may potentially affect the performance on GPU.

To solve the problems above, I write a new version of  Gamma sampling on GPU 
innovated by this blog post: 
[https://lips.cs.princeton.edu/a-parallel-gamma-sampling-implementation/](url)


### Implementation details

My implementation differs from the current version in the following aspects: 

1. Instead of generating samples in the kernel, we generate them in advance 
using host api, which allows us to  fill a buffer with random samples directly.

2. Redundant samples are generated to replace the while loop. Suppose we are 
going to generate a Gamma tensor of size **(N,)**,  N x (M + 1)  zero-one 
gaussian samples and N x (M + 1) zero-one uniform samples will be generated 
before entering the kernel, where M is a predefined const. For each entity, we 
generate M proposed Gamma r.v. and then select the first accepted one as the 
output.  The one extra sample is required when \alpha is less than one.

3. In case all M proposed samples get rejected in some entities (which would be 
marked as -1), we simply resample the random buffer again and perform another 
round of rejection sampling, **but only at the entities that fail the last 
round**.

Here's part of the implementation :
[https://gist.github.com/xidulu/cd9da21f2ecbccd9b784cadd67844e23](url)

In my experiment, I set M to be 1 ( i.e. no redundant samples are generated.) 
as the adopted policy(Marsaglia and Tsang's method) has a rather high 
acceptance rate of around 98%. 

The profiling result is listed below:

| Size | native numpy | ndarray on GPU | my implementation |
|--|--||---|
| 10e2 | <0.1ms   | 3~5ms  | 0.5~0.7ms |
| 10e4 | 0.76ms   | 7.6~7.8ms  | 0.72~0.76ms   |
| 10e6 | 70ms | 12~13ms| 3.1ms |
| 10e8 | 7200ms   | 1600~1700ms| 150~160ms |




The new version is currently under development on  `numpy` branch. It also 
designed to support broadcastable parameters.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/15928