Re: CI and PRs

2019-08-14 Thread Carin Meier
Before any binding tests are moved to nightly, I think we need to figure
out how the community can get proper notifications of failure and success
on those nightly runs. Otherwise, I think that breakages would go unnoticed.

-Carin

On Tue, Aug 13, 2019 at 7:47 PM Pedro Larroy 
wrote:

> Hi
>
> Seems we are hitting some problems in CI. I propose the following action
> items to remedy the situation and accelerate turn around times in CI,
> reduce cost, complexity and probability of failure blocking PRs and
> frustrating developers:
>
> * Upgrade Windows visual studio from VS 2015 to VS 2017. The
> build_windows.py infrastructure should easily work with the new version.
> Currently some PRs are blocked by this:
> https://github.com/apache/incubator-mxnet/issues/13958
> * Move Gluon Model zoo tests to nightly. Tracked at
> https://github.com/apache/incubator-mxnet/issues/15295
> * Move non-python bindings tests to nightly. If a commit is touching other
> bindings, the reviewer should ask for a full run which can be done locally,
> use the label bot to trigger a full CI build, or defer to nightly.
> * Provide a couple of basic sanity performance tests on small models that
> are run on CI and can be echoed by the label bot as a comment for PRs.
> * Address unit tests that take more than 10-20s, streamline them or move
> them to nightly if it can't be done.
> * Open sourcing the remaining CI infrastructure scripts so the community
> can contribute.
>
> I think our goal should be turnaround under 30min.
>
> I would also like to touch base with the community that some PRs are not
> being followed up by committers asking for changes. For example this PR is
> importtant and is hanging for a long time.
>
> https://github.com/apache/incubator-mxnet/pull/15051
>
> This is another, less important but more trivial to review:
>
> https://github.com/apache/incubator-mxnet/pull/14940
>
> I think comitters requesting changes and not folllowing up in reasonable
> time is not healthy for the project. I suggest configuring github
> Notifications for a good SNR and following up.
>
> Regards.
>
> Pedro.
>


Re: CI and PRs

2019-08-14 Thread Chaitanya Bapat
Pedro,

great job of summarizing the set of tasks to restore CI's glory!
As far as your list goes,

> * Address unit tests that take more than 10-20s, streamline them or move
> them to nightly if it can't be done.

I would like to call out this request specifically. I'm tracking # of
timeouts that happen (and this is by no means an exhaustive list) - PR
#15880 
It's unreasonable for CI to run tests for 3 hours. So, we do need to
address this issue with greater intent.

Moreover, to add to the tale of CI woes, we should make it robust enough
for network connection errors.
At times, CI fails due to inability to fetch some packages.
1. Error log doesn't mention corrective action (on the part of PR author -
"to retrigger the CI")
2. Would have been great had CI handled it smartly (or some sort of way to
fasten the process of passing the CI)

Hopefully, with the help of community, we would be able to catch exceptions
and make CI great again!


On Wed, 14 Aug 2019 at 05:09, Carin Meier  wrote:

> Before any binding tests are moved to nightly, I think we need to figure
> out how the community can get proper notifications of failure and success
> on those nightly runs. Otherwise, I think that breakages would go
> unnoticed.
>
> -Carin
>
> On Tue, Aug 13, 2019 at 7:47 PM Pedro Larroy  >
> wrote:
>
> > Hi
> >
> > Seems we are hitting some problems in CI. I propose the following action
> > items to remedy the situation and accelerate turn around times in CI,
> > reduce cost, complexity and probability of failure blocking PRs and
> > frustrating developers:
> >
> > * Upgrade Windows visual studio from VS 2015 to VS 2017. The
> > build_windows.py infrastructure should easily work with the new version.
> > Currently some PRs are blocked by this:
> > https://github.com/apache/incubator-mxnet/issues/13958
> > * Move Gluon Model zoo tests to nightly. Tracked at
> > https://github.com/apache/incubator-mxnet/issues/15295
> > * Move non-python bindings tests to nightly. If a commit is touching
> other
> > bindings, the reviewer should ask for a full run which can be done
> locally,
> > use the label bot to trigger a full CI build, or defer to nightly.
> > * Provide a couple of basic sanity performance tests on small models that
> > are run on CI and can be echoed by the label bot as a comment for PRs.
> > * Address unit tests that take more than 10-20s, streamline them or move
> > them to nightly if it can't be done.

> * Open sourcing the remaining CI infrastructure scripts so the community
> > can contribute.
> >
> > I think our goal should be turnaround under 30min.
> >
> > I would also like to touch base with the community that some PRs are not
> > being followed up by committers asking for changes. For example this PR
> is
> > importtant and is hanging for a long time.
> >
> > https://github.com/apache/incubator-mxnet/pull/15051
> >
> > This is another, less important but more trivial to review:
> >
> > https://github.com/apache/incubator-mxnet/pull/14940
> >
> > I think comitters requesting changes and not folllowing up in reasonable
> > time is not healthy for the project. I suggest configuring github
> > Notifications for a good SNR and following up.
> >
> > Regards.
> >
> > Pedro.
> >
>


-- 
*Chaitanya Prakash Bapat*
*+1 (973) 953-6299*

[image: https://www.linkedin.com//in/chaibapat25]
[image: https://www.facebook.com/chaibapat]
[image:
https://twitter.com/ChaiBapchya] [image:
https://www.linkedin.com//in/chaibapat25]



Re: CI and PRs

2019-08-14 Thread Pedro Larroy
Hi Carin.

That's a good point, all things considered would your preference be to keep
the Clojure tests as part of the PR process or in Nightly?
Some options are having notifications here or in slack. But if we think
breakages would go unnoticed maybe is not a good idea to fully remove
bindings from the PR process and just streamline the process.

Pedro.

On Wed, Aug 14, 2019 at 5:09 AM Carin Meier  wrote:

> Before any binding tests are moved to nightly, I think we need to figure
> out how the community can get proper notifications of failure and success
> on those nightly runs. Otherwise, I think that breakages would go
> unnoticed.
>
> -Carin
>
> On Tue, Aug 13, 2019 at 7:47 PM Pedro Larroy  >
> wrote:
>
> > Hi
> >
> > Seems we are hitting some problems in CI. I propose the following action
> > items to remedy the situation and accelerate turn around times in CI,
> > reduce cost, complexity and probability of failure blocking PRs and
> > frustrating developers:
> >
> > * Upgrade Windows visual studio from VS 2015 to VS 2017. The
> > build_windows.py infrastructure should easily work with the new version.
> > Currently some PRs are blocked by this:
> > https://github.com/apache/incubator-mxnet/issues/13958
> > * Move Gluon Model zoo tests to nightly. Tracked at
> > https://github.com/apache/incubator-mxnet/issues/15295
> > * Move non-python bindings tests to nightly. If a commit is touching
> other
> > bindings, the reviewer should ask for a full run which can be done
> locally,
> > use the label bot to trigger a full CI build, or defer to nightly.
> > * Provide a couple of basic sanity performance tests on small models that
> > are run on CI and can be echoed by the label bot as a comment for PRs.
> > * Address unit tests that take more than 10-20s, streamline them or move
> > them to nightly if it can't be done.
> > * Open sourcing the remaining CI infrastructure scripts so the community
> > can contribute.
> >
> > I think our goal should be turnaround under 30min.
> >
> > I would also like to touch base with the community that some PRs are not
> > being followed up by committers asking for changes. For example this PR
> is
> > importtant and is hanging for a long time.
> >
> > https://github.com/apache/incubator-mxnet/pull/15051
> >
> > This is another, less important but more trivial to review:
> >
> > https://github.com/apache/incubator-mxnet/pull/14940
> >
> > I think comitters requesting changes and not folllowing up in reasonable
> > time is not healthy for the project. I suggest configuring github
> > Notifications for a good SNR and following up.
> >
> > Regards.
> >
> > Pedro.
> >
>


Re: CI and PRs

2019-08-14 Thread Pedro Larroy
Yes another point is that pushing again to the PR should cancel previous
builds which is now not happening which wastes resources.

Any ideas how to make connection errors more robust? The Ivy cache for JVM
packages for example could be pre-populated in the workers. It's a balance
between complexity and efficiency and simplicity.

Maybe maven has some settings to retry download failures for example. For
failures downloading gpg keys we just stored them in the repository to
avoid networking problems.


On Wed, Aug 14, 2019 at 9:39 AM Chaitanya Bapat 
wrote:

> Pedro,
>
> great job of summarizing the set of tasks to restore CI's glory!
> As far as your list goes,
>
> > * Address unit tests that take more than 10-20s, streamline them or move
> > them to nightly if it can't be done.
>
> I would like to call out this request specifically. I'm tracking # of
> timeouts that happen (and this is by no means an exhaustive list) - PR
> #15880 
> It's unreasonable for CI to run tests for 3 hours. So, we do need to
> address this issue with greater intent.
>
> Moreover, to add to the tale of CI woes, we should make it robust enough
> for network connection errors.
> At times, CI fails due to inability to fetch some packages.
> 1. Error log doesn't mention corrective action (on the part of PR author -
> "to retrigger the CI")
> 2. Would have been great had CI handled it smartly (or some sort of way to
> fasten the process of passing the CI)
>
> Hopefully, with the help of community, we would be able to catch exceptions
> and make CI great again!
>
>
> On Wed, 14 Aug 2019 at 05:09, Carin Meier  wrote:
>
> > Before any binding tests are moved to nightly, I think we need to figure
> > out how the community can get proper notifications of failure and success
> > on those nightly runs. Otherwise, I think that breakages would go
> > unnoticed.
> >
> > -Carin
> >
> > On Tue, Aug 13, 2019 at 7:47 PM Pedro Larroy <
> pedro.larroy.li...@gmail.com
> > >
> > wrote:
> >
> > > Hi
> > >
> > > Seems we are hitting some problems in CI. I propose the following
> action
> > > items to remedy the situation and accelerate turn around times in CI,
> > > reduce cost, complexity and probability of failure blocking PRs and
> > > frustrating developers:
> > >
> > > * Upgrade Windows visual studio from VS 2015 to VS 2017. The
> > > build_windows.py infrastructure should easily work with the new
> version.
> > > Currently some PRs are blocked by this:
> > > https://github.com/apache/incubator-mxnet/issues/13958
> > > * Move Gluon Model zoo tests to nightly. Tracked at
> > > https://github.com/apache/incubator-mxnet/issues/15295
> > > * Move non-python bindings tests to nightly. If a commit is touching
> > other
> > > bindings, the reviewer should ask for a full run which can be done
> > locally,
> > > use the label bot to trigger a full CI build, or defer to nightly.
> > > * Provide a couple of basic sanity performance tests on small models
> that
> > > are run on CI and can be echoed by the label bot as a comment for PRs.
> > > * Address unit tests that take more than 10-20s, streamline them or
> move
> > > them to nightly if it can't be done.
>
> > * Open sourcing the remaining CI infrastructure scripts so the community
> > > can contribute.
> > >
> > > I think our goal should be turnaround under 30min.
> > >
> > > I would also like to touch base with the community that some PRs are
> not
> > > being followed up by committers asking for changes. For example this PR
> > is
> > > importtant and is hanging for a long time.
> > >
> > > https://github.com/apache/incubator-mxnet/pull/15051
> > >
> > > This is another, less important but more trivial to review:
> > >
> > > https://github.com/apache/incubator-mxnet/pull/14940
> > >
> > > I think comitters requesting changes and not folllowing up in
> reasonable
> > > time is not healthy for the project. I suggest configuring github
> > > Notifications for a good SNR and following up.
> > >
> > > Regards.
> > >
> > > Pedro.
> > >
> >
>
>
> --
> *Chaitanya Prakash Bapat*
> *+1 (973) 953-6299*
>
> [image: https://www.linkedin.com//in/chaibapat25]
> [image: https://www.facebook.com/chaibapat
> ]
> [image:
> https://twitter.com/ChaiBapchya] [image:
> https://www.linkedin.com//in/chaibapat25]
> 
>


Re: CI and PRs

2019-08-14 Thread Carin Meier
I would prefer to keep the language binding in the PR process. Perhaps we
could do some analytics to see how much each of the language bindings is
contributing to overall run time.
If we have some metrics on that, maybe we can come up with a guideline of
how much time each should take. Another possibility is leverage the
parallel builds more.

On Wed, Aug 14, 2019 at 1:30 PM Pedro Larroy 
wrote:

> Hi Carin.
>
> That's a good point, all things considered would your preference be to keep
> the Clojure tests as part of the PR process or in Nightly?
> Some options are having notifications here or in slack. But if we think
> breakages would go unnoticed maybe is not a good idea to fully remove
> bindings from the PR process and just streamline the process.
>
> Pedro.
>
> On Wed, Aug 14, 2019 at 5:09 AM Carin Meier  wrote:
>
> > Before any binding tests are moved to nightly, I think we need to figure
> > out how the community can get proper notifications of failure and success
> > on those nightly runs. Otherwise, I think that breakages would go
> > unnoticed.
> >
> > -Carin
> >
> > On Tue, Aug 13, 2019 at 7:47 PM Pedro Larroy <
> pedro.larroy.li...@gmail.com
> > >
> > wrote:
> >
> > > Hi
> > >
> > > Seems we are hitting some problems in CI. I propose the following
> action
> > > items to remedy the situation and accelerate turn around times in CI,
> > > reduce cost, complexity and probability of failure blocking PRs and
> > > frustrating developers:
> > >
> > > * Upgrade Windows visual studio from VS 2015 to VS 2017. The
> > > build_windows.py infrastructure should easily work with the new
> version.
> > > Currently some PRs are blocked by this:
> > > https://github.com/apache/incubator-mxnet/issues/13958
> > > * Move Gluon Model zoo tests to nightly. Tracked at
> > > https://github.com/apache/incubator-mxnet/issues/15295
> > > * Move non-python bindings tests to nightly. If a commit is touching
> > other
> > > bindings, the reviewer should ask for a full run which can be done
> > locally,
> > > use the label bot to trigger a full CI build, or defer to nightly.
> > > * Provide a couple of basic sanity performance tests on small models
> that
> > > are run on CI and can be echoed by the label bot as a comment for PRs.
> > > * Address unit tests that take more than 10-20s, streamline them or
> move
> > > them to nightly if it can't be done.
> > > * Open sourcing the remaining CI infrastructure scripts so the
> community
> > > can contribute.
> > >
> > > I think our goal should be turnaround under 30min.
> > >
> > > I would also like to touch base with the community that some PRs are
> not
> > > being followed up by committers asking for changes. For example this PR
> > is
> > > importtant and is hanging for a long time.
> > >
> > > https://github.com/apache/incubator-mxnet/pull/15051
> > >
> > > This is another, less important but more trivial to review:
> > >
> > > https://github.com/apache/incubator-mxnet/pull/14940
> > >
> > > I think comitters requesting changes and not folllowing up in
> reasonable
> > > time is not healthy for the project. I suggest configuring github
> > > Notifications for a good SNR and following up.
> > >
> > > Regards.
> > >
> > > Pedro.
> > >
> >
>


Re: CI and PRs

2019-08-14 Thread Marco de Abreu
Hi,

we record a bunch of metrics about run statistics (down to the duration of
every individual step). If you tell me which ones you're particularly
interested in (probably total duration of each node in the test stage), I'm
happy to provide them.

Dimensions are (in hierarchical order):
- job
- branch
- stage
- node
- step

Unfortunately I don't have the possibility to export them since we store
them in CloudWatch Metrics which afaik doesn't offer raw exports.

Best regards,
Marco

Carin Meier  schrieb am Mi., 14. Aug. 2019, 19:43:

> I would prefer to keep the language binding in the PR process. Perhaps we
> could do some analytics to see how much each of the language bindings is
> contributing to overall run time.
> If we have some metrics on that, maybe we can come up with a guideline of
> how much time each should take. Another possibility is leverage the
> parallel builds more.
>
> On Wed, Aug 14, 2019 at 1:30 PM Pedro Larroy  >
> wrote:
>
> > Hi Carin.
> >
> > That's a good point, all things considered would your preference be to
> keep
> > the Clojure tests as part of the PR process or in Nightly?
> > Some options are having notifications here or in slack. But if we think
> > breakages would go unnoticed maybe is not a good idea to fully remove
> > bindings from the PR process and just streamline the process.
> >
> > Pedro.
> >
> > On Wed, Aug 14, 2019 at 5:09 AM Carin Meier 
> wrote:
> >
> > > Before any binding tests are moved to nightly, I think we need to
> figure
> > > out how the community can get proper notifications of failure and
> success
> > > on those nightly runs. Otherwise, I think that breakages would go
> > > unnoticed.
> > >
> > > -Carin
> > >
> > > On Tue, Aug 13, 2019 at 7:47 PM Pedro Larroy <
> > pedro.larroy.li...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi
> > > >
> > > > Seems we are hitting some problems in CI. I propose the following
> > action
> > > > items to remedy the situation and accelerate turn around times in CI,
> > > > reduce cost, complexity and probability of failure blocking PRs and
> > > > frustrating developers:
> > > >
> > > > * Upgrade Windows visual studio from VS 2015 to VS 2017. The
> > > > build_windows.py infrastructure should easily work with the new
> > version.
> > > > Currently some PRs are blocked by this:
> > > > https://github.com/apache/incubator-mxnet/issues/13958
> > > > * Move Gluon Model zoo tests to nightly. Tracked at
> > > > https://github.com/apache/incubator-mxnet/issues/15295
> > > > * Move non-python bindings tests to nightly. If a commit is touching
> > > other
> > > > bindings, the reviewer should ask for a full run which can be done
> > > locally,
> > > > use the label bot to trigger a full CI build, or defer to nightly.
> > > > * Provide a couple of basic sanity performance tests on small models
> > that
> > > > are run on CI and can be echoed by the label bot as a comment for
> PRs.
> > > > * Address unit tests that take more than 10-20s, streamline them or
> > move
> > > > them to nightly if it can't be done.
> > > > * Open sourcing the remaining CI infrastructure scripts so the
> > community
> > > > can contribute.
> > > >
> > > > I think our goal should be turnaround under 30min.
> > > >
> > > > I would also like to touch base with the community that some PRs are
> > not
> > > > being followed up by committers asking for changes. For example this
> PR
> > > is
> > > > importtant and is hanging for a long time.
> > > >
> > > > https://github.com/apache/incubator-mxnet/pull/15051
> > > >
> > > > This is another, less important but more trivial to review:
> > > >
> > > > https://github.com/apache/incubator-mxnet/pull/14940
> > > >
> > > > I think comitters requesting changes and not folllowing up in
> > reasonable
> > > > time is not healthy for the project. I suggest configuring github
> > > > Notifications for a good SNR and following up.
> > > >
> > > > Regards.
> > > >
> > > > Pedro.
> > > >
> > >
> >
>


Re: CI and PRs

2019-08-14 Thread Carin Meier
Great idea Marco! Anything that you think would be valuable to share would
be good. The duration of each node in the test stage sounds like a good
start.

- Carin

On Wed, Aug 14, 2019 at 2:48 PM Marco de Abreu 
wrote:

> Hi,
>
> we record a bunch of metrics about run statistics (down to the duration of
> every individual step). If you tell me which ones you're particularly
> interested in (probably total duration of each node in the test stage), I'm
> happy to provide them.
>
> Dimensions are (in hierarchical order):
> - job
> - branch
> - stage
> - node
> - step
>
> Unfortunately I don't have the possibility to export them since we store
> them in CloudWatch Metrics which afaik doesn't offer raw exports.
>
> Best regards,
> Marco
>
> Carin Meier  schrieb am Mi., 14. Aug. 2019, 19:43:
>
> > I would prefer to keep the language binding in the PR process. Perhaps we
> > could do some analytics to see how much each of the language bindings is
> > contributing to overall run time.
> > If we have some metrics on that, maybe we can come up with a guideline of
> > how much time each should take. Another possibility is leverage the
> > parallel builds more.
> >
> > On Wed, Aug 14, 2019 at 1:30 PM Pedro Larroy <
> pedro.larroy.li...@gmail.com
> > >
> > wrote:
> >
> > > Hi Carin.
> > >
> > > That's a good point, all things considered would your preference be to
> > keep
> > > the Clojure tests as part of the PR process or in Nightly?
> > > Some options are having notifications here or in slack. But if we think
> > > breakages would go unnoticed maybe is not a good idea to fully remove
> > > bindings from the PR process and just streamline the process.
> > >
> > > Pedro.
> > >
> > > On Wed, Aug 14, 2019 at 5:09 AM Carin Meier 
> > wrote:
> > >
> > > > Before any binding tests are moved to nightly, I think we need to
> > figure
> > > > out how the community can get proper notifications of failure and
> > success
> > > > on those nightly runs. Otherwise, I think that breakages would go
> > > > unnoticed.
> > > >
> > > > -Carin
> > > >
> > > > On Tue, Aug 13, 2019 at 7:47 PM Pedro Larroy <
> > > pedro.larroy.li...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > Seems we are hitting some problems in CI. I propose the following
> > > action
> > > > > items to remedy the situation and accelerate turn around times in
> CI,
> > > > > reduce cost, complexity and probability of failure blocking PRs and
> > > > > frustrating developers:
> > > > >
> > > > > * Upgrade Windows visual studio from VS 2015 to VS 2017. The
> > > > > build_windows.py infrastructure should easily work with the new
> > > version.
> > > > > Currently some PRs are blocked by this:
> > > > > https://github.com/apache/incubator-mxnet/issues/13958
> > > > > * Move Gluon Model zoo tests to nightly. Tracked at
> > > > > https://github.com/apache/incubator-mxnet/issues/15295
> > > > > * Move non-python bindings tests to nightly. If a commit is
> touching
> > > > other
> > > > > bindings, the reviewer should ask for a full run which can be done
> > > > locally,
> > > > > use the label bot to trigger a full CI build, or defer to nightly.
> > > > > * Provide a couple of basic sanity performance tests on small
> models
> > > that
> > > > > are run on CI and can be echoed by the label bot as a comment for
> > PRs.
> > > > > * Address unit tests that take more than 10-20s, streamline them or
> > > move
> > > > > them to nightly if it can't be done.
> > > > > * Open sourcing the remaining CI infrastructure scripts so the
> > > community
> > > > > can contribute.
> > > > >
> > > > > I think our goal should be turnaround under 30min.
> > > > >
> > > > > I would also like to touch base with the community that some PRs
> are
> > > not
> > > > > being followed up by committers asking for changes. For example
> this
> > PR
> > > > is
> > > > > importtant and is hanging for a long time.
> > > > >
> > > > > https://github.com/apache/incubator-mxnet/pull/15051
> > > > >
> > > > > This is another, less important but more trivial to review:
> > > > >
> > > > > https://github.com/apache/incubator-mxnet/pull/14940
> > > > >
> > > > > I think comitters requesting changes and not folllowing up in
> > > reasonable
> > > > > time is not healthy for the project. I suggest configuring github
> > > > > Notifications for a good SNR and following up.
> > > > >
> > > > > Regards.
> > > > >
> > > > > Pedro.
> > > > >
> > > >
> > >
> >
>


Re: CI and PRs

2019-08-14 Thread Pedro Larroy
>From what I have seen Clojure is 15 minutes, which I think is reasonable.
The only question is that when a binding such as R, Perl or Clojure fails,
some devs are a bit confused about how to fix them since they are not
familiar with the testing tools and the language.

On Wed, Aug 14, 2019 at 11:57 AM Carin Meier  wrote:

> Great idea Marco! Anything that you think would be valuable to share would
> be good. The duration of each node in the test stage sounds like a good
> start.
>
> - Carin
>
> On Wed, Aug 14, 2019 at 2:48 PM Marco de Abreu 
> wrote:
>
> > Hi,
> >
> > we record a bunch of metrics about run statistics (down to the duration
> of
> > every individual step). If you tell me which ones you're particularly
> > interested in (probably total duration of each node in the test stage),
> I'm
> > happy to provide them.
> >
> > Dimensions are (in hierarchical order):
> > - job
> > - branch
> > - stage
> > - node
> > - step
> >
> > Unfortunately I don't have the possibility to export them since we store
> > them in CloudWatch Metrics which afaik doesn't offer raw exports.
> >
> > Best regards,
> > Marco
> >
> > Carin Meier  schrieb am Mi., 14. Aug. 2019, 19:43:
> >
> > > I would prefer to keep the language binding in the PR process. Perhaps
> we
> > > could do some analytics to see how much each of the language bindings
> is
> > > contributing to overall run time.
> > > If we have some metrics on that, maybe we can come up with a guideline
> of
> > > how much time each should take. Another possibility is leverage the
> > > parallel builds more.
> > >
> > > On Wed, Aug 14, 2019 at 1:30 PM Pedro Larroy <
> > pedro.larroy.li...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi Carin.
> > > >
> > > > That's a good point, all things considered would your preference be
> to
> > > keep
> > > > the Clojure tests as part of the PR process or in Nightly?
> > > > Some options are having notifications here or in slack. But if we
> think
> > > > breakages would go unnoticed maybe is not a good idea to fully remove
> > > > bindings from the PR process and just streamline the process.
> > > >
> > > > Pedro.
> > > >
> > > > On Wed, Aug 14, 2019 at 5:09 AM Carin Meier 
> > > wrote:
> > > >
> > > > > Before any binding tests are moved to nightly, I think we need to
> > > figure
> > > > > out how the community can get proper notifications of failure and
> > > success
> > > > > on those nightly runs. Otherwise, I think that breakages would go
> > > > > unnoticed.
> > > > >
> > > > > -Carin
> > > > >
> > > > > On Tue, Aug 13, 2019 at 7:47 PM Pedro Larroy <
> > > > pedro.larroy.li...@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi
> > > > > >
> > > > > > Seems we are hitting some problems in CI. I propose the following
> > > > action
> > > > > > items to remedy the situation and accelerate turn around times in
> > CI,
> > > > > > reduce cost, complexity and probability of failure blocking PRs
> and
> > > > > > frustrating developers:
> > > > > >
> > > > > > * Upgrade Windows visual studio from VS 2015 to VS 2017. The
> > > > > > build_windows.py infrastructure should easily work with the new
> > > > version.
> > > > > > Currently some PRs are blocked by this:
> > > > > > https://github.com/apache/incubator-mxnet/issues/13958
> > > > > > * Move Gluon Model zoo tests to nightly. Tracked at
> > > > > > https://github.com/apache/incubator-mxnet/issues/15295
> > > > > > * Move non-python bindings tests to nightly. If a commit is
> > touching
> > > > > other
> > > > > > bindings, the reviewer should ask for a full run which can be
> done
> > > > > locally,
> > > > > > use the label bot to trigger a full CI build, or defer to
> nightly.
> > > > > > * Provide a couple of basic sanity performance tests on small
> > models
> > > > that
> > > > > > are run on CI and can be echoed by the label bot as a comment for
> > > PRs.
> > > > > > * Address unit tests that take more than 10-20s, streamline them
> or
> > > > move
> > > > > > them to nightly if it can't be done.
> > > > > > * Open sourcing the remaining CI infrastructure scripts so the
> > > > community
> > > > > > can contribute.
> > > > > >
> > > > > > I think our goal should be turnaround under 30min.
> > > > > >
> > > > > > I would also like to touch base with the community that some PRs
> > are
> > > > not
> > > > > > being followed up by committers asking for changes. For example
> > this
> > > PR
> > > > > is
> > > > > > importtant and is hanging for a long time.
> > > > > >
> > > > > > https://github.com/apache/incubator-mxnet/pull/15051
> > > > > >
> > > > > > This is another, less important but more trivial to review:
> > > > > >
> > > > > > https://github.com/apache/incubator-mxnet/pull/14940
> > > > > >
> > > > > > I think comitters requesting changes and not folllowing up in
> > > > reasonable
> > > > > > time is not healthy for the project. I suggest configuring github
> > > > > > Notifications for a good SNR and following up.
> > > > > >
> > > > > > Rega

Re: CI and PRs

2019-08-14 Thread Carin Meier
If a language binding test is failing for a not important reason, then it
is too brittle and needs to be fixed (we have fixed some of these with the
Clojure package [1]).
But in general, if we thinking of the MXNet project as one project that is
across all the language bindings, then we want to know if some fundamental
code change is going to break a downstream package.
I can't speak for all the high level package binding maintainers, but I'm
always happy to pitch in to provide code fixes to help the base PR get
green.

The time costs to maintain such a large CI project obviously needs to be
considered as well.

[1] https://github.com/apache/incubator-mxnet/pull/15579

On Wed, Aug 14, 2019 at 3:48 PM Pedro Larroy 
wrote:

> From what I have seen Clojure is 15 minutes, which I think is reasonable.
> The only question is that when a binding such as R, Perl or Clojure fails,
> some devs are a bit confused about how to fix them since they are not
> familiar with the testing tools and the language.
>
> On Wed, Aug 14, 2019 at 11:57 AM Carin Meier  wrote:
>
> > Great idea Marco! Anything that you think would be valuable to share
> would
> > be good. The duration of each node in the test stage sounds like a good
> > start.
> >
> > - Carin
> >
> > On Wed, Aug 14, 2019 at 2:48 PM Marco de Abreu 
> > wrote:
> >
> > > Hi,
> > >
> > > we record a bunch of metrics about run statistics (down to the duration
> > of
> > > every individual step). If you tell me which ones you're particularly
> > > interested in (probably total duration of each node in the test stage),
> > I'm
> > > happy to provide them.
> > >
> > > Dimensions are (in hierarchical order):
> > > - job
> > > - branch
> > > - stage
> > > - node
> > > - step
> > >
> > > Unfortunately I don't have the possibility to export them since we
> store
> > > them in CloudWatch Metrics which afaik doesn't offer raw exports.
> > >
> > > Best regards,
> > > Marco
> > >
> > > Carin Meier  schrieb am Mi., 14. Aug. 2019,
> 19:43:
> > >
> > > > I would prefer to keep the language binding in the PR process.
> Perhaps
> > we
> > > > could do some analytics to see how much each of the language bindings
> > is
> > > > contributing to overall run time.
> > > > If we have some metrics on that, maybe we can come up with a
> guideline
> > of
> > > > how much time each should take. Another possibility is leverage the
> > > > parallel builds more.
> > > >
> > > > On Wed, Aug 14, 2019 at 1:30 PM Pedro Larroy <
> > > pedro.larroy.li...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi Carin.
> > > > >
> > > > > That's a good point, all things considered would your preference be
> > to
> > > > keep
> > > > > the Clojure tests as part of the PR process or in Nightly?
> > > > > Some options are having notifications here or in slack. But if we
> > think
> > > > > breakages would go unnoticed maybe is not a good idea to fully
> remove
> > > > > bindings from the PR process and just streamline the process.
> > > > >
> > > > > Pedro.
> > > > >
> > > > > On Wed, Aug 14, 2019 at 5:09 AM Carin Meier 
> > > > wrote:
> > > > >
> > > > > > Before any binding tests are moved to nightly, I think we need to
> > > > figure
> > > > > > out how the community can get proper notifications of failure and
> > > > success
> > > > > > on those nightly runs. Otherwise, I think that breakages would go
> > > > > > unnoticed.
> > > > > >
> > > > > > -Carin
> > > > > >
> > > > > > On Tue, Aug 13, 2019 at 7:47 PM Pedro Larroy <
> > > > > pedro.larroy.li...@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi
> > > > > > >
> > > > > > > Seems we are hitting some problems in CI. I propose the
> following
> > > > > action
> > > > > > > items to remedy the situation and accelerate turn around times
> in
> > > CI,
> > > > > > > reduce cost, complexity and probability of failure blocking PRs
> > and
> > > > > > > frustrating developers:
> > > > > > >
> > > > > > > * Upgrade Windows visual studio from VS 2015 to VS 2017. The
> > > > > > > build_windows.py infrastructure should easily work with the new
> > > > > version.
> > > > > > > Currently some PRs are blocked by this:
> > > > > > > https://github.com/apache/incubator-mxnet/issues/13958
> > > > > > > * Move Gluon Model zoo tests to nightly. Tracked at
> > > > > > > https://github.com/apache/incubator-mxnet/issues/15295
> > > > > > > * Move non-python bindings tests to nightly. If a commit is
> > > touching
> > > > > > other
> > > > > > > bindings, the reviewer should ask for a full run which can be
> > done
> > > > > > locally,
> > > > > > > use the label bot to trigger a full CI build, or defer to
> > nightly.
> > > > > > > * Provide a couple of basic sanity performance tests on small
> > > models
> > > > > that
> > > > > > > are run on CI and can be echoed by the label bot as a comment
> for
> > > > PRs.
> > > > > > > * Address unit tests that take more than 10-20s, streamline
> them
> > or
> > > > > move
> > > > > > > them to nightly if it can

Re: CI and PRs

2019-08-14 Thread Marco de Abreu
With regards to time I rather prefer us spending a bit more time on
maintenance than somebody running into an error that could've been caught
with a test.

I mean, our Publishing pipeline for Scala GPU has been broken for quite
some time now, but nobody noticed that. Basically my stance on that matter
is that as soon as something is not blocking, you can also just deactivate
it since you don't have a forcing function in an open source project.
People will rarely come back and fix the errors of some nightly test that
they introduced.

-Marco

Carin Meier  schrieb am Mi., 14. Aug. 2019, 21:59:

> If a language binding test is failing for a not important reason, then it
> is too brittle and needs to be fixed (we have fixed some of these with the
> Clojure package [1]).
> But in general, if we thinking of the MXNet project as one project that is
> across all the language bindings, then we want to know if some fundamental
> code change is going to break a downstream package.
> I can't speak for all the high level package binding maintainers, but I'm
> always happy to pitch in to provide code fixes to help the base PR get
> green.
>
> The time costs to maintain such a large CI project obviously needs to be
> considered as well.
>
> [1] https://github.com/apache/incubator-mxnet/pull/15579
>
> On Wed, Aug 14, 2019 at 3:48 PM Pedro Larroy  >
> wrote:
>
> > From what I have seen Clojure is 15 minutes, which I think is reasonable.
> > The only question is that when a binding such as R, Perl or Clojure
> fails,
> > some devs are a bit confused about how to fix them since they are not
> > familiar with the testing tools and the language.
> >
> > On Wed, Aug 14, 2019 at 11:57 AM Carin Meier 
> wrote:
> >
> > > Great idea Marco! Anything that you think would be valuable to share
> > would
> > > be good. The duration of each node in the test stage sounds like a good
> > > start.
> > >
> > > - Carin
> > >
> > > On Wed, Aug 14, 2019 at 2:48 PM Marco de Abreu <
> marco.g.ab...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > we record a bunch of metrics about run statistics (down to the
> duration
> > > of
> > > > every individual step). If you tell me which ones you're particularly
> > > > interested in (probably total duration of each node in the test
> stage),
> > > I'm
> > > > happy to provide them.
> > > >
> > > > Dimensions are (in hierarchical order):
> > > > - job
> > > > - branch
> > > > - stage
> > > > - node
> > > > - step
> > > >
> > > > Unfortunately I don't have the possibility to export them since we
> > store
> > > > them in CloudWatch Metrics which afaik doesn't offer raw exports.
> > > >
> > > > Best regards,
> > > > Marco
> > > >
> > > > Carin Meier  schrieb am Mi., 14. Aug. 2019,
> > 19:43:
> > > >
> > > > > I would prefer to keep the language binding in the PR process.
> > Perhaps
> > > we
> > > > > could do some analytics to see how much each of the language
> bindings
> > > is
> > > > > contributing to overall run time.
> > > > > If we have some metrics on that, maybe we can come up with a
> > guideline
> > > of
> > > > > how much time each should take. Another possibility is leverage the
> > > > > parallel builds more.
> > > > >
> > > > > On Wed, Aug 14, 2019 at 1:30 PM Pedro Larroy <
> > > > pedro.larroy.li...@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Carin.
> > > > > >
> > > > > > That's a good point, all things considered would your preference
> be
> > > to
> > > > > keep
> > > > > > the Clojure tests as part of the PR process or in Nightly?
> > > > > > Some options are having notifications here or in slack. But if we
> > > think
> > > > > > breakages would go unnoticed maybe is not a good idea to fully
> > remove
> > > > > > bindings from the PR process and just streamline the process.
> > > > > >
> > > > > > Pedro.
> > > > > >
> > > > > > On Wed, Aug 14, 2019 at 5:09 AM Carin Meier <
> carinme...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Before any binding tests are moved to nightly, I think we need
> to
> > > > > figure
> > > > > > > out how the community can get proper notifications of failure
> and
> > > > > success
> > > > > > > on those nightly runs. Otherwise, I think that breakages would
> go
> > > > > > > unnoticed.
> > > > > > >
> > > > > > > -Carin
> > > > > > >
> > > > > > > On Tue, Aug 13, 2019 at 7:47 PM Pedro Larroy <
> > > > > > pedro.larroy.li...@gmail.com
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi
> > > > > > > >
> > > > > > > > Seems we are hitting some problems in CI. I propose the
> > following
> > > > > > action
> > > > > > > > items to remedy the situation and accelerate turn around
> times
> > in
> > > > CI,
> > > > > > > > reduce cost, complexity and probability of failure blocking
> PRs
> > > and
> > > > > > > > frustrating developers:
> > > > > > > >
> > > > > > > > * Upgrade Windows visual studio from VS 2015 to VS 2017. The
> > > > > > > > build_windows.py infrastructure should easily work with the
>

Showcase your project at ApacheCON at a Podling's Shark Tank

2019-08-14 Thread Roman Shaposhnik
Hi Podlings!

in less than a month we're going to have our first
ApacheCON this year -- the one in Las Vegas. In
about two month there will be one more in Berlin.

These are not your regular ApacheCONs -- these are
20th Anniversary of ASF ApacehCONs! In other words,
these are not to be missed!

And even if your talk didn't get accepted -- you still
get an opportunity to highlight your project to, what's
likely going to be the biggest audience attending.

Here's how: if you (or any community member who's
passionate about your project) are going to be at either
of those ApacheCONs consider signing up for
Podling's Shark Tank
events:
https://www.apachecon.com/acna19/s/#/scheduledEvent/1038
https://aceu19.apachecon.com/session/podlings-shark-tank

Each project presenting will get ~10 min for the pitch and ~5 min
of panel grilling them on all sorts of things. Kind of like this ;-)
 https://www.youtube.com/watch?v=wmenN7NEdBc

You've got nothing to lose (in fact, the opposite: you're likely to get
a prize!) and you will get a chance to receive feedback that might
actually help you grow your community and ultimately graduate to the
TLP status. And! Given our awesome panel of judges:
 * Myrle Krantz
 * Justin Mclean
 * Craig Russel
 * Shane Curcuru
We guarantee this to be a fun and useful event for your community!

We will be tracking signups over here:
 https://wiki.apache.org/apachecon/ACNA19PodlingSharkTank
 https://wiki.apache.org/apachecon/ACEU19PodlingSharkTank
but for now:

SIMPLY REPLY TO THIS EMAIL if you're interested.

It is first come, first serve -- so don't delay -- sign up today!

Thanks,
Roman.


Re: CI and PRs

2019-08-14 Thread Pedro Larroy
Hi Marco.

I have to agree with you on that, from past experience.
What do you suggest for maintenance?  Do we need a watermark that fails the
validation if the total runtime exceeds a high threshold?

Pedro.

On Wed, Aug 14, 2019 at 1:03 PM Marco de Abreu 
wrote:

> With regards to time I rather prefer us spending a bit more time on
> maintenance than somebody running into an error that could've been caught
> with a test.
>
> I mean, our Publishing pipeline for Scala GPU has been broken for quite
> some time now, but nobody noticed that. Basically my stance on that matter
> is that as soon as something is not blocking, you can also just deactivate
> it since you don't have a forcing function in an open source project.
> People will rarely come back and fix the errors of some nightly test that
> they introduced.
>
> -Marco
>
> Carin Meier  schrieb am Mi., 14. Aug. 2019, 21:59:
>
> > If a language binding test is failing for a not important reason, then it
> > is too brittle and needs to be fixed (we have fixed some of these with
> the
> > Clojure package [1]).
> > But in general, if we thinking of the MXNet project as one project that
> is
> > across all the language bindings, then we want to know if some
> fundamental
> > code change is going to break a downstream package.
> > I can't speak for all the high level package binding maintainers, but I'm
> > always happy to pitch in to provide code fixes to help the base PR get
> > green.
> >
> > The time costs to maintain such a large CI project obviously needs to be
> > considered as well.
> >
> > [1] https://github.com/apache/incubator-mxnet/pull/15579
> >
> > On Wed, Aug 14, 2019 at 3:48 PM Pedro Larroy <
> pedro.larroy.li...@gmail.com
> > >
> > wrote:
> >
> > > From what I have seen Clojure is 15 minutes, which I think is
> reasonable.
> > > The only question is that when a binding such as R, Perl or Clojure
> > fails,
> > > some devs are a bit confused about how to fix them since they are not
> > > familiar with the testing tools and the language.
> > >
> > > On Wed, Aug 14, 2019 at 11:57 AM Carin Meier 
> > wrote:
> > >
> > > > Great idea Marco! Anything that you think would be valuable to share
> > > would
> > > > be good. The duration of each node in the test stage sounds like a
> good
> > > > start.
> > > >
> > > > - Carin
> > > >
> > > > On Wed, Aug 14, 2019 at 2:48 PM Marco de Abreu <
> > marco.g.ab...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > we record a bunch of metrics about run statistics (down to the
> > duration
> > > > of
> > > > > every individual step). If you tell me which ones you're
> particularly
> > > > > interested in (probably total duration of each node in the test
> > stage),
> > > > I'm
> > > > > happy to provide them.
> > > > >
> > > > > Dimensions are (in hierarchical order):
> > > > > - job
> > > > > - branch
> > > > > - stage
> > > > > - node
> > > > > - step
> > > > >
> > > > > Unfortunately I don't have the possibility to export them since we
> > > store
> > > > > them in CloudWatch Metrics which afaik doesn't offer raw exports.
> > > > >
> > > > > Best regards,
> > > > > Marco
> > > > >
> > > > > Carin Meier  schrieb am Mi., 14. Aug. 2019,
> > > 19:43:
> > > > >
> > > > > > I would prefer to keep the language binding in the PR process.
> > > Perhaps
> > > > we
> > > > > > could do some analytics to see how much each of the language
> > bindings
> > > > is
> > > > > > contributing to overall run time.
> > > > > > If we have some metrics on that, maybe we can come up with a
> > > guideline
> > > > of
> > > > > > how much time each should take. Another possibility is leverage
> the
> > > > > > parallel builds more.
> > > > > >
> > > > > > On Wed, Aug 14, 2019 at 1:30 PM Pedro Larroy <
> > > > > pedro.larroy.li...@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Carin.
> > > > > > >
> > > > > > > That's a good point, all things considered would your
> preference
> > be
> > > > to
> > > > > > keep
> > > > > > > the Clojure tests as part of the PR process or in Nightly?
> > > > > > > Some options are having notifications here or in slack. But if
> we
> > > > think
> > > > > > > breakages would go unnoticed maybe is not a good idea to fully
> > > remove
> > > > > > > bindings from the PR process and just streamline the process.
> > > > > > >
> > > > > > > Pedro.
> > > > > > >
> > > > > > > On Wed, Aug 14, 2019 at 5:09 AM Carin Meier <
> > carinme...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Before any binding tests are moved to nightly, I think we
> need
> > to
> > > > > > figure
> > > > > > > > out how the community can get proper notifications of failure
> > and
> > > > > > success
> > > > > > > > on those nightly runs. Otherwise, I think that breakages
> would
> > go
> > > > > > > > unnoticed.
> > > > > > > >
> > > > > > > > -Carin
> > > > > > > >
> > > > > > > > On Tue, Aug 13, 2019 at 7:47 PM Pedro Larroy <
> > > > > > > pedro.larroy.li...@gmail.com
> > > > >

Re: CI and PRs

2019-08-14 Thread Chris Olivier
+1

Rather than remove tests (which doesn’t scale as a solution), why not scale
them horizontally so that they finish more quickly? Across processes or
even on a pool of machines that aren’t necessarily the build machine?

On Wed, Aug 14, 2019 at 12:03 PM Marco de Abreu 
wrote:

> With regards to time I rather prefer us spending a bit more time on
> maintenance than somebody running into an error that could've been caught
> with a test.
>
> I mean, our Publishing pipeline for Scala GPU has been broken for quite
> some time now, but nobody noticed that. Basically my stance on that matter
> is that as soon as something is not blocking, you can also just deactivate
> it since you don't have a forcing function in an open source project.
> People will rarely come back and fix the errors of some nightly test that
> they introduced.
>
> -Marco
>
> Carin Meier  schrieb am Mi., 14. Aug. 2019, 21:59:
>
> > If a language binding test is failing for a not important reason, then it
> > is too brittle and needs to be fixed (we have fixed some of these with
> the
> > Clojure package [1]).
> > But in general, if we thinking of the MXNet project as one project that
> is
> > across all the language bindings, then we want to know if some
> fundamental
> > code change is going to break a downstream package.
> > I can't speak for all the high level package binding maintainers, but I'm
> > always happy to pitch in to provide code fixes to help the base PR get
> > green.
> >
> > The time costs to maintain such a large CI project obviously needs to be
> > considered as well.
> >
> > [1] https://github.com/apache/incubator-mxnet/pull/15579
> >
> > On Wed, Aug 14, 2019 at 3:48 PM Pedro Larroy <
> pedro.larroy.li...@gmail.com
> > >
> > wrote:
> >
> > > From what I have seen Clojure is 15 minutes, which I think is
> reasonable.
> > > The only question is that when a binding such as R, Perl or Clojure
> > fails,
> > > some devs are a bit confused about how to fix them since they are not
> > > familiar with the testing tools and the language.
> > >
> > > On Wed, Aug 14, 2019 at 11:57 AM Carin Meier 
> > wrote:
> > >
> > > > Great idea Marco! Anything that you think would be valuable to share
> > > would
> > > > be good. The duration of each node in the test stage sounds like a
> good
> > > > start.
> > > >
> > > > - Carin
> > > >
> > > > On Wed, Aug 14, 2019 at 2:48 PM Marco de Abreu <
> > marco.g.ab...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > we record a bunch of metrics about run statistics (down to the
> > duration
> > > > of
> > > > > every individual step). If you tell me which ones you're
> particularly
> > > > > interested in (probably total duration of each node in the test
> > stage),
> > > > I'm
> > > > > happy to provide them.
> > > > >
> > > > > Dimensions are (in hierarchical order):
> > > > > - job
> > > > > - branch
> > > > > - stage
> > > > > - node
> > > > > - step
> > > > >
> > > > > Unfortunately I don't have the possibility to export them since we
> > > store
> > > > > them in CloudWatch Metrics which afaik doesn't offer raw exports.
> > > > >
> > > > > Best regards,
> > > > > Marco
> > > > >
> > > > > Carin Meier  schrieb am Mi., 14. Aug. 2019,
> > > 19:43:
> > > > >
> > > > > > I would prefer to keep the language binding in the PR process.
> > > Perhaps
> > > > we
> > > > > > could do some analytics to see how much each of the language
> > bindings
> > > > is
> > > > > > contributing to overall run time.
> > > > > > If we have some metrics on that, maybe we can come up with a
> > > guideline
> > > > of
> > > > > > how much time each should take. Another possibility is leverage
> the
> > > > > > parallel builds more.
> > > > > >
> > > > > > On Wed, Aug 14, 2019 at 1:30 PM Pedro Larroy <
> > > > > pedro.larroy.li...@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Carin.
> > > > > > >
> > > > > > > That's a good point, all things considered would your
> preference
> > be
> > > > to
> > > > > > keep
> > > > > > > the Clojure tests as part of the PR process or in Nightly?
> > > > > > > Some options are having notifications here or in slack. But if
> we
> > > > think
> > > > > > > breakages would go unnoticed maybe is not a good idea to fully
> > > remove
> > > > > > > bindings from the PR process and just streamline the process.
> > > > > > >
> > > > > > > Pedro.
> > > > > > >
> > > > > > > On Wed, Aug 14, 2019 at 5:09 AM Carin Meier <
> > carinme...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Before any binding tests are moved to nightly, I think we
> need
> > to
> > > > > > figure
> > > > > > > > out how the community can get proper notifications of failure
> > and
> > > > > > success
> > > > > > > > on those nightly runs. Otherwise, I think that breakages
> would
> > go
> > > > > > > > unnoticed.
> > > > > > > >
> > > > > > > > -Carin
> > > > > > > >
> > > > > > > > On Tue, Aug 13, 2019 at 7:47 PM Pedro Larroy <
> > > > > > > pedro.larroy.li...@gmail

Re: CI and PRs

2019-08-14 Thread Pedro Larroy
Sounds good in theory. I think there are complex details with regards of
resource sharing during parallel execution. Still I think both ways can be
explored. I think some tests run for unreasonably long times for what they
are doing. We already scale parts of the pipeline horizontally across
workers.


On Wed, Aug 14, 2019 at 5:12 PM Chris Olivier 
wrote:

> +1
>
> Rather than remove tests (which doesn’t scale as a solution), why not scale
> them horizontally so that they finish more quickly? Across processes or
> even on a pool of machines that aren’t necessarily the build machine?
>
> On Wed, Aug 14, 2019 at 12:03 PM Marco de Abreu 
> wrote:
>
> > With regards to time I rather prefer us spending a bit more time on
> > maintenance than somebody running into an error that could've been caught
> > with a test.
> >
> > I mean, our Publishing pipeline for Scala GPU has been broken for quite
> > some time now, but nobody noticed that. Basically my stance on that
> matter
> > is that as soon as something is not blocking, you can also just
> deactivate
> > it since you don't have a forcing function in an open source project.
> > People will rarely come back and fix the errors of some nightly test that
> > they introduced.
> >
> > -Marco
> >
> > Carin Meier  schrieb am Mi., 14. Aug. 2019, 21:59:
> >
> > > If a language binding test is failing for a not important reason, then
> it
> > > is too brittle and needs to be fixed (we have fixed some of these with
> > the
> > > Clojure package [1]).
> > > But in general, if we thinking of the MXNet project as one project that
> > is
> > > across all the language bindings, then we want to know if some
> > fundamental
> > > code change is going to break a downstream package.
> > > I can't speak for all the high level package binding maintainers, but
> I'm
> > > always happy to pitch in to provide code fixes to help the base PR get
> > > green.
> > >
> > > The time costs to maintain such a large CI project obviously needs to
> be
> > > considered as well.
> > >
> > > [1] https://github.com/apache/incubator-mxnet/pull/15579
> > >
> > > On Wed, Aug 14, 2019 at 3:48 PM Pedro Larroy <
> > pedro.larroy.li...@gmail.com
> > > >
> > > wrote:
> > >
> > > > From what I have seen Clojure is 15 minutes, which I think is
> > reasonable.
> > > > The only question is that when a binding such as R, Perl or Clojure
> > > fails,
> > > > some devs are a bit confused about how to fix them since they are not
> > > > familiar with the testing tools and the language.
> > > >
> > > > On Wed, Aug 14, 2019 at 11:57 AM Carin Meier 
> > > wrote:
> > > >
> > > > > Great idea Marco! Anything that you think would be valuable to
> share
> > > > would
> > > > > be good. The duration of each node in the test stage sounds like a
> > good
> > > > > start.
> > > > >
> > > > > - Carin
> > > > >
> > > > > On Wed, Aug 14, 2019 at 2:48 PM Marco de Abreu <
> > > marco.g.ab...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > we record a bunch of metrics about run statistics (down to the
> > > duration
> > > > > of
> > > > > > every individual step). If you tell me which ones you're
> > particularly
> > > > > > interested in (probably total duration of each node in the test
> > > stage),
> > > > > I'm
> > > > > > happy to provide them.
> > > > > >
> > > > > > Dimensions are (in hierarchical order):
> > > > > > - job
> > > > > > - branch
> > > > > > - stage
> > > > > > - node
> > > > > > - step
> > > > > >
> > > > > > Unfortunately I don't have the possibility to export them since
> we
> > > > store
> > > > > > them in CloudWatch Metrics which afaik doesn't offer raw exports.
> > > > > >
> > > > > > Best regards,
> > > > > > Marco
> > > > > >
> > > > > > Carin Meier  schrieb am Mi., 14. Aug.
> 2019,
> > > > 19:43:
> > > > > >
> > > > > > > I would prefer to keep the language binding in the PR process.
> > > > Perhaps
> > > > > we
> > > > > > > could do some analytics to see how much each of the language
> > > bindings
> > > > > is
> > > > > > > contributing to overall run time.
> > > > > > > If we have some metrics on that, maybe we can come up with a
> > > > guideline
> > > > > of
> > > > > > > how much time each should take. Another possibility is leverage
> > the
> > > > > > > parallel builds more.
> > > > > > >
> > > > > > > On Wed, Aug 14, 2019 at 1:30 PM Pedro Larroy <
> > > > > > pedro.larroy.li...@gmail.com
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Carin.
> > > > > > > >
> > > > > > > > That's a good point, all things considered would your
> > preference
> > > be
> > > > > to
> > > > > > > keep
> > > > > > > > the Clojure tests as part of the PR process or in Nightly?
> > > > > > > > Some options are having notifications here or in slack. But
> if
> > we
> > > > > think
> > > > > > > > breakages would go unnoticed maybe is not a good idea to
> fully
> > > > remove
> > > > > > > > bindings from the PR process and just streamline the process.
> > > > > > > 

Re: CI and PRs

2019-08-14 Thread Chris Olivier
I see it done daily now, and while I can’t share all the details, it’s not
an incredibly complex thing, and involves not much more than nfs/efs
sharing and remote ssh commands.  All it takes is a little ingenuity and
some imagination.

On Wed, Aug 14, 2019 at 4:31 PM Pedro Larroy 
wrote:

> Sounds good in theory. I think there are complex details with regards of
> resource sharing during parallel execution. Still I think both ways can be
> explored. I think some tests run for unreasonably long times for what they
> are doing. We already scale parts of the pipeline horizontally across
> workers.
>
>
> On Wed, Aug 14, 2019 at 5:12 PM Chris Olivier 
> wrote:
>
> > +1
> >
> > Rather than remove tests (which doesn’t scale as a solution), why not
> scale
> > them horizontally so that they finish more quickly? Across processes or
> > even on a pool of machines that aren’t necessarily the build machine?
> >
> > On Wed, Aug 14, 2019 at 12:03 PM Marco de Abreu  >
> > wrote:
> >
> > > With regards to time I rather prefer us spending a bit more time on
> > > maintenance than somebody running into an error that could've been
> caught
> > > with a test.
> > >
> > > I mean, our Publishing pipeline for Scala GPU has been broken for quite
> > > some time now, but nobody noticed that. Basically my stance on that
> > matter
> > > is that as soon as something is not blocking, you can also just
> > deactivate
> > > it since you don't have a forcing function in an open source project.
> > > People will rarely come back and fix the errors of some nightly test
> that
> > > they introduced.
> > >
> > > -Marco
> > >
> > > Carin Meier  schrieb am Mi., 14. Aug. 2019,
> 21:59:
> > >
> > > > If a language binding test is failing for a not important reason,
> then
> > it
> > > > is too brittle and needs to be fixed (we have fixed some of these
> with
> > > the
> > > > Clojure package [1]).
> > > > But in general, if we thinking of the MXNet project as one project
> that
> > > is
> > > > across all the language bindings, then we want to know if some
> > > fundamental
> > > > code change is going to break a downstream package.
> > > > I can't speak for all the high level package binding maintainers, but
> > I'm
> > > > always happy to pitch in to provide code fixes to help the base PR
> get
> > > > green.
> > > >
> > > > The time costs to maintain such a large CI project obviously needs to
> > be
> > > > considered as well.
> > > >
> > > > [1] https://github.com/apache/incubator-mxnet/pull/15579
> > > >
> > > > On Wed, Aug 14, 2019 at 3:48 PM Pedro Larroy <
> > > pedro.larroy.li...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > From what I have seen Clojure is 15 minutes, which I think is
> > > reasonable.
> > > > > The only question is that when a binding such as R, Perl or Clojure
> > > > fails,
> > > > > some devs are a bit confused about how to fix them since they are
> not
> > > > > familiar with the testing tools and the language.
> > > > >
> > > > > On Wed, Aug 14, 2019 at 11:57 AM Carin Meier  >
> > > > wrote:
> > > > >
> > > > > > Great idea Marco! Anything that you think would be valuable to
> > share
> > > > > would
> > > > > > be good. The duration of each node in the test stage sounds like
> a
> > > good
> > > > > > start.
> > > > > >
> > > > > > - Carin
> > > > > >
> > > > > > On Wed, Aug 14, 2019 at 2:48 PM Marco de Abreu <
> > > > marco.g.ab...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > we record a bunch of metrics about run statistics (down to the
> > > > duration
> > > > > > of
> > > > > > > every individual step). If you tell me which ones you're
> > > particularly
> > > > > > > interested in (probably total duration of each node in the test
> > > > stage),
> > > > > > I'm
> > > > > > > happy to provide them.
> > > > > > >
> > > > > > > Dimensions are (in hierarchical order):
> > > > > > > - job
> > > > > > > - branch
> > > > > > > - stage
> > > > > > > - node
> > > > > > > - step
> > > > > > >
> > > > > > > Unfortunately I don't have the possibility to export them since
> > we
> > > > > store
> > > > > > > them in CloudWatch Metrics which afaik doesn't offer raw
> exports.
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Marco
> > > > > > >
> > > > > > > Carin Meier  schrieb am Mi., 14. Aug.
> > 2019,
> > > > > 19:43:
> > > > > > >
> > > > > > > > I would prefer to keep the language binding in the PR
> process.
> > > > > Perhaps
> > > > > > we
> > > > > > > > could do some analytics to see how much each of the language
> > > > bindings
> > > > > > is
> > > > > > > > contributing to overall run time.
> > > > > > > > If we have some metrics on that, maybe we can come up with a
> > > > > guideline
> > > > > > of
> > > > > > > > how much time each should take. Another possibility is
> leverage
> > > the
> > > > > > > > parallel builds more.
> > > > > > > >
> > > > > > > > On Wed, Aug 14, 2019 at 1:30 PM Pedro Larroy <
> > > > > > > pedro.larroy.li...@gmail.com
> > >

Re: CI and PRs

2019-08-14 Thread Aaron Markham
The PRs Thomas and I are working on for the new docs and website share the
mxnet binary in the new CI pipelines we made. Speeds things up a lot.

On Wed, Aug 14, 2019, 18:16 Chris Olivier  wrote:

> I see it done daily now, and while I can’t share all the details, it’s not
> an incredibly complex thing, and involves not much more than nfs/efs
> sharing and remote ssh commands.  All it takes is a little ingenuity and
> some imagination.
>
> On Wed, Aug 14, 2019 at 4:31 PM Pedro Larroy  >
> wrote:
>
> > Sounds good in theory. I think there are complex details with regards of
> > resource sharing during parallel execution. Still I think both ways can
> be
> > explored. I think some tests run for unreasonably long times for what
> they
> > are doing. We already scale parts of the pipeline horizontally across
> > workers.
> >
> >
> > On Wed, Aug 14, 2019 at 5:12 PM Chris Olivier 
> > wrote:
> >
> > > +1
> > >
> > > Rather than remove tests (which doesn’t scale as a solution), why not
> > scale
> > > them horizontally so that they finish more quickly? Across processes or
> > > even on a pool of machines that aren’t necessarily the build machine?
> > >
> > > On Wed, Aug 14, 2019 at 12:03 PM Marco de Abreu <
> marco.g.ab...@gmail.com
> > >
> > > wrote:
> > >
> > > > With regards to time I rather prefer us spending a bit more time on
> > > > maintenance than somebody running into an error that could've been
> > caught
> > > > with a test.
> > > >
> > > > I mean, our Publishing pipeline for Scala GPU has been broken for
> quite
> > > > some time now, but nobody noticed that. Basically my stance on that
> > > matter
> > > > is that as soon as something is not blocking, you can also just
> > > deactivate
> > > > it since you don't have a forcing function in an open source project.
> > > > People will rarely come back and fix the errors of some nightly test
> > that
> > > > they introduced.
> > > >
> > > > -Marco
> > > >
> > > > Carin Meier  schrieb am Mi., 14. Aug. 2019,
> > 21:59:
> > > >
> > > > > If a language binding test is failing for a not important reason,
> > then
> > > it
> > > > > is too brittle and needs to be fixed (we have fixed some of these
> > with
> > > > the
> > > > > Clojure package [1]).
> > > > > But in general, if we thinking of the MXNet project as one project
> > that
> > > > is
> > > > > across all the language bindings, then we want to know if some
> > > > fundamental
> > > > > code change is going to break a downstream package.
> > > > > I can't speak for all the high level package binding maintainers,
> but
> > > I'm
> > > > > always happy to pitch in to provide code fixes to help the base PR
> > get
> > > > > green.
> > > > >
> > > > > The time costs to maintain such a large CI project obviously needs
> to
> > > be
> > > > > considered as well.
> > > > >
> > > > > [1] https://github.com/apache/incubator-mxnet/pull/15579
> > > > >
> > > > > On Wed, Aug 14, 2019 at 3:48 PM Pedro Larroy <
> > > > pedro.larroy.li...@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > From what I have seen Clojure is 15 minutes, which I think is
> > > > reasonable.
> > > > > > The only question is that when a binding such as R, Perl or
> Clojure
> > > > > fails,
> > > > > > some devs are a bit confused about how to fix them since they are
> > not
> > > > > > familiar with the testing tools and the language.
> > > > > >
> > > > > > On Wed, Aug 14, 2019 at 11:57 AM Carin Meier <
> carinme...@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Great idea Marco! Anything that you think would be valuable to
> > > share
> > > > > > would
> > > > > > > be good. The duration of each node in the test stage sounds
> like
> > a
> > > > good
> > > > > > > start.
> > > > > > >
> > > > > > > - Carin
> > > > > > >
> > > > > > > On Wed, Aug 14, 2019 at 2:48 PM Marco de Abreu <
> > > > > marco.g.ab...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > we record a bunch of metrics about run statistics (down to
> the
> > > > > duration
> > > > > > > of
> > > > > > > > every individual step). If you tell me which ones you're
> > > > particularly
> > > > > > > > interested in (probably total duration of each node in the
> test
> > > > > stage),
> > > > > > > I'm
> > > > > > > > happy to provide them.
> > > > > > > >
> > > > > > > > Dimensions are (in hierarchical order):
> > > > > > > > - job
> > > > > > > > - branch
> > > > > > > > - stage
> > > > > > > > - node
> > > > > > > > - step
> > > > > > > >
> > > > > > > > Unfortunately I don't have the possibility to export them
> since
> > > we
> > > > > > store
> > > > > > > > them in CloudWatch Metrics which afaik doesn't offer raw
> > exports.
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Marco
> > > > > > > >
> > > > > > > > Carin Meier  schrieb am Mi., 14. Aug.
> > > 2019,
> > > > > > 19:43:
> > > > > > > >
> > > > > > > > > I would prefer to keep the language binding in the PR
> > process.
> > > > > >

new website (RE: CI and PRs)

2019-08-14 Thread Zhao, Patric
Hi Aaron,

Recently, we are working on improving the documents of CPU backend based on the 
current website.

I saw there're several PRs to update the new website and it's really great.

Thus, I'd like to know when the new website will online. 
If it's very near, we will switch our works to the new website.

Thanks,

--Patric


> -Original Message-
> From: Aaron Markham 
> Sent: Thursday, August 15, 2019 11:40 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: CI and PRs
> 
> The PRs Thomas and I are working on for the new docs and website share
> the mxnet binary in the new CI pipelines we made. Speeds things up a lot.
> 
> On Wed, Aug 14, 2019, 18:16 Chris Olivier  wrote:
> 
> > I see it done daily now, and while I can’t share all the details, it’s
> > not an incredibly complex thing, and involves not much more than
> > nfs/efs sharing and remote ssh commands.  All it takes is a little
> > ingenuity and some imagination.
> >
> > On Wed, Aug 14, 2019 at 4:31 PM Pedro Larroy
> >  > >
> > wrote:
> >
> > > Sounds good in theory. I think there are complex details with
> > > regards of resource sharing during parallel execution. Still I think
> > > both ways can
> > be
> > > explored. I think some tests run for unreasonably long times for
> > > what
> > they
> > > are doing. We already scale parts of the pipeline horizontally
> > > across workers.
> > >
> > >
> > > On Wed, Aug 14, 2019 at 5:12 PM Chris Olivier
> > > 
> > > wrote:
> > >
> > > > +1
> > > >
> > > > Rather than remove tests (which doesn’t scale as a solution), why
> > > > not
> > > scale
> > > > them horizontally so that they finish more quickly? Across
> > > > processes or even on a pool of machines that aren’t necessarily the
> build machine?
> > > >
> > > > On Wed, Aug 14, 2019 at 12:03 PM Marco de Abreu <
> > marco.g.ab...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > With regards to time I rather prefer us spending a bit more time
> > > > > on maintenance than somebody running into an error that could've
> > > > > been
> > > caught
> > > > > with a test.
> > > > >
> > > > > I mean, our Publishing pipeline for Scala GPU has been broken
> > > > > for
> > quite
> > > > > some time now, but nobody noticed that. Basically my stance on
> > > > > that
> > > > matter
> > > > > is that as soon as something is not blocking, you can also just
> > > > deactivate
> > > > > it since you don't have a forcing function in an open source project.
> > > > > People will rarely come back and fix the errors of some nightly
> > > > > test
> > > that
> > > > > they introduced.
> > > > >
> > > > > -Marco
> > > > >
> > > > > Carin Meier  schrieb am Mi., 14. Aug.
> > > > > 2019,
> > > 21:59:
> > > > >
> > > > > > If a language binding test is failing for a not important
> > > > > > reason,
> > > then
> > > > it
> > > > > > is too brittle and needs to be fixed (we have fixed some of
> > > > > > these
> > > with
> > > > > the
> > > > > > Clojure package [1]).
> > > > > > But in general, if we thinking of the MXNet project as one
> > > > > > project
> > > that
> > > > > is
> > > > > > across all the language bindings, then we want to know if some
> > > > > fundamental
> > > > > > code change is going to break a downstream package.
> > > > > > I can't speak for all the high level package binding
> > > > > > maintainers,
> > but
> > > > I'm
> > > > > > always happy to pitch in to provide code fixes to help the
> > > > > > base PR
> > > get
> > > > > > green.
> > > > > >
> > > > > > The time costs to maintain such a large CI project obviously
> > > > > > needs
> > to
> > > > be
> > > > > > considered as well.
> > > > > >
> > > > > > [1] https://github.com/apache/incubator-mxnet/pull/15579
> > > > > >
> > > > > > On Wed, Aug 14, 2019 at 3:48 PM Pedro Larroy <
> > > > > pedro.larroy.li...@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > From what I have seen Clojure is 15 minutes, which I think
> > > > > > > is
> > > > > reasonable.
> > > > > > > The only question is that when a binding such as R, Perl or
> > Clojure
> > > > > > fails,
> > > > > > > some devs are a bit confused about how to fix them since
> > > > > > > they are
> > > not
> > > > > > > familiar with the testing tools and the language.
> > > > > > >
> > > > > > > On Wed, Aug 14, 2019 at 11:57 AM Carin Meier <
> > carinme...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Great idea Marco! Anything that you think would be
> > > > > > > > valuable to
> > > > share
> > > > > > > would
> > > > > > > > be good. The duration of each node in the test stage
> > > > > > > > sounds
> > like
> > > a
> > > > > good
> > > > > > > > start.
> > > > > > > >
> > > > > > > > - Carin
> > > > > > > >
> > > > > > > > On Wed, Aug 14, 2019 at 2:48 PM Marco de Abreu <
> > > > > > marco.g.ab...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > we record a bunch of metrics about run statistics (down
> > > > > > > > > to
> > t

Re: CI and PRs

2019-08-14 Thread Marco de Abreu
The first start wrt parallelization could certainly be start adding
parallel test execution in nosetests.

-Marco

Aaron Markham  schrieb am Do., 15. Aug. 2019,
05:39:

> The PRs Thomas and I are working on for the new docs and website share the
> mxnet binary in the new CI pipelines we made. Speeds things up a lot.
>
> On Wed, Aug 14, 2019, 18:16 Chris Olivier  wrote:
>
> > I see it done daily now, and while I can’t share all the details, it’s
> not
> > an incredibly complex thing, and involves not much more than nfs/efs
> > sharing and remote ssh commands.  All it takes is a little ingenuity and
> > some imagination.
> >
> > On Wed, Aug 14, 2019 at 4:31 PM Pedro Larroy <
> pedro.larroy.li...@gmail.com
> > >
> > wrote:
> >
> > > Sounds good in theory. I think there are complex details with regards
> of
> > > resource sharing during parallel execution. Still I think both ways can
> > be
> > > explored. I think some tests run for unreasonably long times for what
> > they
> > > are doing. We already scale parts of the pipeline horizontally across
> > > workers.
> > >
> > >
> > > On Wed, Aug 14, 2019 at 5:12 PM Chris Olivier 
> > > wrote:
> > >
> > > > +1
> > > >
> > > > Rather than remove tests (which doesn’t scale as a solution), why not
> > > scale
> > > > them horizontally so that they finish more quickly? Across processes
> or
> > > > even on a pool of machines that aren’t necessarily the build machine?
> > > >
> > > > On Wed, Aug 14, 2019 at 12:03 PM Marco de Abreu <
> > marco.g.ab...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > With regards to time I rather prefer us spending a bit more time on
> > > > > maintenance than somebody running into an error that could've been
> > > caught
> > > > > with a test.
> > > > >
> > > > > I mean, our Publishing pipeline for Scala GPU has been broken for
> > quite
> > > > > some time now, but nobody noticed that. Basically my stance on that
> > > > matter
> > > > > is that as soon as something is not blocking, you can also just
> > > > deactivate
> > > > > it since you don't have a forcing function in an open source
> project.
> > > > > People will rarely come back and fix the errors of some nightly
> test
> > > that
> > > > > they introduced.
> > > > >
> > > > > -Marco
> > > > >
> > > > > Carin Meier  schrieb am Mi., 14. Aug. 2019,
> > > 21:59:
> > > > >
> > > > > > If a language binding test is failing for a not important reason,
> > > then
> > > > it
> > > > > > is too brittle and needs to be fixed (we have fixed some of these
> > > with
> > > > > the
> > > > > > Clojure package [1]).
> > > > > > But in general, if we thinking of the MXNet project as one
> project
> > > that
> > > > > is
> > > > > > across all the language bindings, then we want to know if some
> > > > > fundamental
> > > > > > code change is going to break a downstream package.
> > > > > > I can't speak for all the high level package binding maintainers,
> > but
> > > > I'm
> > > > > > always happy to pitch in to provide code fixes to help the base
> PR
> > > get
> > > > > > green.
> > > > > >
> > > > > > The time costs to maintain such a large CI project obviously
> needs
> > to
> > > > be
> > > > > > considered as well.
> > > > > >
> > > > > > [1] https://github.com/apache/incubator-mxnet/pull/15579
> > > > > >
> > > > > > On Wed, Aug 14, 2019 at 3:48 PM Pedro Larroy <
> > > > > pedro.larroy.li...@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > From what I have seen Clojure is 15 minutes, which I think is
> > > > > reasonable.
> > > > > > > The only question is that when a binding such as R, Perl or
> > Clojure
> > > > > > fails,
> > > > > > > some devs are a bit confused about how to fix them since they
> are
> > > not
> > > > > > > familiar with the testing tools and the language.
> > > > > > >
> > > > > > > On Wed, Aug 14, 2019 at 11:57 AM Carin Meier <
> > carinme...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Great idea Marco! Anything that you think would be valuable
> to
> > > > share
> > > > > > > would
> > > > > > > > be good. The duration of each node in the test stage sounds
> > like
> > > a
> > > > > good
> > > > > > > > start.
> > > > > > > >
> > > > > > > > - Carin
> > > > > > > >
> > > > > > > > On Wed, Aug 14, 2019 at 2:48 PM Marco de Abreu <
> > > > > > marco.g.ab...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > we record a bunch of metrics about run statistics (down to
> > the
> > > > > > duration
> > > > > > > > of
> > > > > > > > > every individual step). If you tell me which ones you're
> > > > > particularly
> > > > > > > > > interested in (probably total duration of each node in the
> > test
> > > > > > stage),
> > > > > > > > I'm
> > > > > > > > > happy to provide them.
> > > > > > > > >
> > > > > > > > > Dimensions are (in hierarchical order):
> > > > > > > > > - job
> > > > > > > > > - branch
> > > > > > > > > - stage
> > > > > > > > > - node
> > > > > > > >