RE: Proposal for subgraph based the graph optimization and quantization

2018-10-03 Thread Zhao, Patric
Sent again and our PR are submitted (in 22 days ago).

Please help take a review in case you're interested in :)
https://github.com/apache/incubator-mxnet/pull/12530

Thanks the great suggestions from Jun, Da, Haibin and other committers.

BR,

--Patric


From: Zhao, Patric
Sent: Wednesday, August 15, 2018 10:51 AM
To: dev@mxnet.incubator.apache.org
Cc: Zheng, Da ; Jun Wu ; Ye, Jason Y 

Subject: Proposal for subgraph based the graph optimization and quantization

Hi MXNet owners and committers,

A new proposal is posted in the wiki for the graph optimization and 
quantization approach.
https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN

Really thanks for the supports from Zheng Da and Wu Jun.

Any feedbacks are highly appreciated :)

BR,

--Patric


Re: [DISCUSS] Use modernized C++11 range loops uniformly throughout the project

2018-10-03 Thread Pedro Larroy
+1

@Chris: do you have data on the performance difference? As far as I know
there's a "rewrite rule" like the one between lambdas and C++ functors, so
performance should be very well defined, but maybe there's something that
you are point out that we are missing.

Having a consistent and concise code base is beneficial, I think what
Kellen is advocating is to use range loops whenever possible, not
prescribing its usage on every case, if you have to iterate backwards there
are other ways such as backwards iterator or others.

On Fri, Sep 28, 2018 at 6:54 AM Chris Olivier 
wrote:

> -1
>
> Range loops aren’t always the most performant way. In addition, sometimes
> you want the index. Or maybe you want to iterate backwards, or not start
> from the first, etc. Maybe you want the iterator because you remove it from
> the list at the bottom of the loop Seems like a rule for the sake of
> having a rule.
>
> On Fri, Sep 28, 2018 at 2:12 AM kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
>
> > Hello MXNet devs,
> >
> > I'd like to discuss uniformly adopting C++11 range loops in the MXNet
> > project.  The benefits I see are:
> >
> > *  Improved C++ readability (examples below).
> > *  Consistency with other languages.  The range-loops are quite similar
> to
> > loops almost all other programming languages.  Given we're a project that
> > supports many languages this language consistency could be positive for
> our
> > community.
> > * Consistency within the same project.  Currently different authors have
> > different loops styles which hurts codebase readability.
> > *  Best available performance.  There are often multiple ways to write
> > loops in C++ with subtle differences in performance and memory usage
> > between loop methods.  Using range-loops ensures we get the best possible
> > perf using an intuitive loop pattern.
> > *  Slightly lower chance for bugs / OOB accesses when dealing with
> indexing
> > in an array for example.
> >
> > If we decide to enable this uniformly throughout the project we can
> enable
> > this policy with a simple clang-tidy configuration change.  There would
> be
> > no need for reviewers to have to manually provide feedback when someone
> > uses an older C++ loops style.
> >
> > -Kellen
> >
> > Reference PR:  https://github.com/apache/incubator-mxnet/pull/12356/
> > Previous clang-tidy discussion on the list:
> >
> >
> https://lists.apache.org/thread.html/b0ae5a9df5dfe0d9074cb2ebe432264db4fa2175b89fa43a5f6e36be@%3Cdev.mxnet.apache.org%3E
> >
> > -
> > Examples:
> > for (auto axis_iter = param.axis.begin() ; axis_iter!= param.axis.end();
> > ++axis_iter) {
> > CHECK_LT(*axis_iter, static_cast(ishape.ndim()));
> > stride_[reverse_index] = ishape[*axis_iter];
> > ...
> > -->
> > for (int axis : param.axis) {
> > CHECK_LT(axis, static_cast(ishape.ndim()));
> > stride_[reverse_index] = ishape[axis];
> > ...
> > --
> > for (size_t i = 0; i < in_array.size(); i++) {
> > auto  = in_array[i];
> > pre_temp_buf_.emplace_back(nd.shape(), nd.ctx(), true, nd.dtype());
> > }
> > -->
> > for (auto & nd : in_array) {
> > pre_temp_buf_.emplace_back(nd.shape(), nd.ctx(), true, nd.dtype());
> > }
> >
>


Re: CUDNN algorithm selection failure

2018-10-03 Thread Pedro Larroy
Seems is not the only test:
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12726/5/pipeline

test_slice_batchnorm_reshape_batchnorm is also failing and hasn't been
touched for a while. It doesn't look like a problem with the test to me,
(not a flaky test). Looks to me that should find and address the root cause
instead of disabling the test in this case.

Pedro.

On Tue, Oct 2, 2018 at 2:39 AM Marco de Abreu
 wrote:

> I have created an issue at
> https://github.com/apache/incubator-mxnet/issues/12715 and a PR to disable
> the test at https://github.com/apache/incubator-mxnet/pull/12716.
>
> This test is pretty new and was submitted with a number of other
> problematic (and disabled) tests:
> https://github.com/apache/incubator-mxnet/issues/11164 It could be
> possible
> that the test is simply not stable enough. The PR that introduced that test
> is https://github.com/apache/incubator-mxnet/pull/10921 - it was merged
> two
> days ago.
>
> Best regards,
> Marco
>
> On Tue, Oct 2, 2018 at 8:43 AM Pedro Larroy 
> wrote:
>
> > Thanks for checking Lin. If it happens again we will have to dig deeper.
> We
> > have just one executor in GPU so I wonder what could be the root cause of
> > this.
> >
> > On Mon, Oct 1, 2018 at 10:57 PM Lin Yuan  wrote:
> >
> > > I could not reproduce the error on an EC2 g3x8 instance making it hard
> to
> > > debug. I also suspect it was due to resource usage limit on ci
> >  Instance.
> > >
> > > On Mon, Oct 1, 2018 at 10:40 PM Pedro Larroy <
> > pedro.larroy.li...@gmail.com
> > > >
> > > wrote:
> > >
> > > > It doesn't look like flakiness to me at first sight. I think it might
> > be
> > > > related to resource usage / allocation / leak in the worst case.
> > > >
> > > > Could be that there was not enough memory GPU memory at the time of
> > test
> > > > execution. But I'm just speculating, hence my original question.
> > > >
> > > > Pedro.
> > > >
> > > > On Mon, Oct 1, 2018 at 8:16 PM Lin Yuan  wrote:
> > > >
> > > > > Hi Pedro,
> > > > >
> > > > > I also got this failure in my PR
> > > > >
> > > > >
> > > >
> > >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11742/27/pipeline
> > > > >
> > > > > I was not able to identify the root cause of it from changelist.
> Are
> > > you
> > > > > suggesting there is some flakiness in the master branch too?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Lin
> > > > >
> > > > > On Mon, Oct 1, 2018 at 4:55 PM Pedro Larroy <
> > > > pedro.larroy.li...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi
> > > > > >
> > > > > > I saw this failure on CI:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1697/pipeline
> > > > > >
> > > > > > Have you seen other cases where we fail to select the best CUDNN
> > > > > algorithm?
> > > > > > In which circumstances this could happen, and do you think is a
> > good
> > > > idea
> > > > > > to have one selected by default as a last resort?
> > > > > >
> > > > > >
> > > > > > Pedro.
> > > > > >
> > > > >
> > > >
> > >
> >
>