Re: segmentation fault in master using mkdlnn

2018-05-02 Thread Chris Olivier
you can try Intel Inspector, which is like an enhanced version of valgrind
with a GUI and whatnot.

On Wed, May 2, 2018 at 9:42 PM Da Zheng  wrote:

> valgrind doesn't work with Python. also, valgrind doesn't support some
> CPU instructions used by MXNet (I think some instructions related to
> random generator).
>
>
> On Wed, May 2, 2018 at 8:59 PM, Bhavin Thaker 
> wrote:
> > Have you tried running with valgrind to get some clues on the root-cause?
> >
> > Bhavin Thaker.
> >
> > On Wed, May 2, 2018 at 8:55 PM Da Zheng  wrote:
> >
> >> It might also be possible that this isn't an MKLDNN bug.
> >> I just saw a similar memory error without MKLDNN build.
> >>
> >>
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-10783/1/pipeline
> >>
> >> Best,
> >> Da
> >>
> >> On Wed, May 2, 2018 at 2:14 PM, Zheng, Da  wrote:
> >> > There might be a race condition that causes the memory error.
> >> > It might be caused by this PR:
> >> > https://github.com/apache/incubator-mxnet/pull/10706/files
> >> > This PR removes MKLDNN memory from NDArray.
> >> > However, I don't know why this causes memory error. If someone is
> using
> >> the memory, it should still hold the memory with shared pointer.
> >> > But I do see the memory error increase after this PR is merged.
> >> >
> >> > Best,
> >> > Da
> >> >
> >> > On 5/2/18, 12:26 PM, "Pedro Larroy" 
> >> wrote:
> >> >
> >> > I couldn't reproduce locally with:
> >> >
> >> > ci/build.py -p ubuntu_cpu /work/runtime_functions.sh
> >> > build_ubuntu_cpu_mkldnn && ci/build.py --platform ubuntu_cpu
> >> > /work/runtime_functions.sh unittest_ubuntu_python2_cpu
> >> >
> >> >
> >> > On Wed, May 2, 2018 at 8:50 PM, Pedro Larroy <
> >> pedro.larroy.li...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi
> >> > >
> >> > > Seems master is not running  anymore, there's a segmentation
> fault
> >> using
> >> > > MKDLNN-CPU
> >> > >
> >> > >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
> >> > > incubator-mxnet/detail/master/801/pipeline/662
> >> > >
> >> > >
> >> > > I see my PRs failing with a similar error.
> >> > >
> >> > > Pedro
> >> > >
> >> >
> >> >
> >>
>


Re: segmentation fault in master using mkdlnn

2018-05-02 Thread Da Zheng
valgrind doesn't work with Python. also, valgrind doesn't support some
CPU instructions used by MXNet (I think some instructions related to
random generator).


On Wed, May 2, 2018 at 8:59 PM, Bhavin Thaker  wrote:
> Have you tried running with valgrind to get some clues on the root-cause?
>
> Bhavin Thaker.
>
> On Wed, May 2, 2018 at 8:55 PM Da Zheng  wrote:
>
>> It might also be possible that this isn't an MKLDNN bug.
>> I just saw a similar memory error without MKLDNN build.
>>
>> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-10783/1/pipeline
>>
>> Best,
>> Da
>>
>> On Wed, May 2, 2018 at 2:14 PM, Zheng, Da  wrote:
>> > There might be a race condition that causes the memory error.
>> > It might be caused by this PR:
>> > https://github.com/apache/incubator-mxnet/pull/10706/files
>> > This PR removes MKLDNN memory from NDArray.
>> > However, I don't know why this causes memory error. If someone is using
>> the memory, it should still hold the memory with shared pointer.
>> > But I do see the memory error increase after this PR is merged.
>> >
>> > Best,
>> > Da
>> >
>> > On 5/2/18, 12:26 PM, "Pedro Larroy" 
>> wrote:
>> >
>> > I couldn't reproduce locally with:
>> >
>> > ci/build.py -p ubuntu_cpu /work/runtime_functions.sh
>> > build_ubuntu_cpu_mkldnn && ci/build.py --platform ubuntu_cpu
>> > /work/runtime_functions.sh unittest_ubuntu_python2_cpu
>> >
>> >
>> > On Wed, May 2, 2018 at 8:50 PM, Pedro Larroy <
>> pedro.larroy.li...@gmail.com>
>> > wrote:
>> >
>> > > Hi
>> > >
>> > > Seems master is not running  anymore, there's a segmentation fault
>> using
>> > > MKDLNN-CPU
>> > >
>> > > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
>> > > incubator-mxnet/detail/master/801/pipeline/662
>> > >
>> > >
>> > > I see my PRs failing with a similar error.
>> > >
>> > > Pedro
>> > >
>> >
>> >
>>


Re: segmentation fault in master using mkdlnn

2018-05-02 Thread Bhavin Thaker
Have you tried running with valgrind to get some clues on the root-cause?

Bhavin Thaker.

On Wed, May 2, 2018 at 8:55 PM Da Zheng  wrote:

> It might also be possible that this isn't an MKLDNN bug.
> I just saw a similar memory error without MKLDNN build.
>
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-10783/1/pipeline
>
> Best,
> Da
>
> On Wed, May 2, 2018 at 2:14 PM, Zheng, Da  wrote:
> > There might be a race condition that causes the memory error.
> > It might be caused by this PR:
> > https://github.com/apache/incubator-mxnet/pull/10706/files
> > This PR removes MKLDNN memory from NDArray.
> > However, I don't know why this causes memory error. If someone is using
> the memory, it should still hold the memory with shared pointer.
> > But I do see the memory error increase after this PR is merged.
> >
> > Best,
> > Da
> >
> > On 5/2/18, 12:26 PM, "Pedro Larroy" 
> wrote:
> >
> > I couldn't reproduce locally with:
> >
> > ci/build.py -p ubuntu_cpu /work/runtime_functions.sh
> > build_ubuntu_cpu_mkldnn && ci/build.py --platform ubuntu_cpu
> > /work/runtime_functions.sh unittest_ubuntu_python2_cpu
> >
> >
> > On Wed, May 2, 2018 at 8:50 PM, Pedro Larroy <
> pedro.larroy.li...@gmail.com>
> > wrote:
> >
> > > Hi
> > >
> > > Seems master is not running  anymore, there's a segmentation fault
> using
> > > MKDLNN-CPU
> > >
> > > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
> > > incubator-mxnet/detail/master/801/pipeline/662
> > >
> > >
> > > I see my PRs failing with a similar error.
> > >
> > > Pedro
> > >
> >
> >
>


Re: segmentation fault in master using mkdlnn

2018-05-02 Thread Da Zheng
It might also be possible that this isn't an MKLDNN bug.
I just saw a similar memory error without MKLDNN build.
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-10783/1/pipeline

Best,
Da

On Wed, May 2, 2018 at 2:14 PM, Zheng, Da  wrote:
> There might be a race condition that causes the memory error.
> It might be caused by this PR:
> https://github.com/apache/incubator-mxnet/pull/10706/files
> This PR removes MKLDNN memory from NDArray.
> However, I don't know why this causes memory error. If someone is using the 
> memory, it should still hold the memory with shared pointer.
> But I do see the memory error increase after this PR is merged.
>
> Best,
> Da
>
> On 5/2/18, 12:26 PM, "Pedro Larroy"  wrote:
>
> I couldn't reproduce locally with:
>
> ci/build.py -p ubuntu_cpu /work/runtime_functions.sh
> build_ubuntu_cpu_mkldnn && ci/build.py --platform ubuntu_cpu
> /work/runtime_functions.sh unittest_ubuntu_python2_cpu
>
>
> On Wed, May 2, 2018 at 8:50 PM, Pedro Larroy 
> 
> wrote:
>
> > Hi
> >
> > Seems master is not running  anymore, there's a segmentation fault using
> > MKDLNN-CPU
> >
> > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
> > incubator-mxnet/detail/master/801/pipeline/662
> >
> >
> > I see my PRs failing with a similar error.
> >
> > Pedro
> >
>
>


Proposed Change to Reshape Operator

2018-05-02 Thread Anirudh Acharya
Hi All,

With the current Reshape operator in MXNet, it is not possible to reshape
an input array when the shape is generated at runtime. I have created a
github issue describing the problem with examples -
https://github.com/apache/incubator-mxnet/issues/10789

Briefly stated, MXNet should support the usecase where both the input array
and the shape tuple are fed to the "reshape" operator as input arguments.

The proposed solution is to modify the existing reshape_like operator to
accept another boolean parameter called infer_shape. And based on this
parameter we can either infer the shape of second input( as it is done now)
or take the second input as the shape array itself.

Please let me know your comments and feedback.


Regards
Anirudh Acharya


Re: segmentation fault in master using mkdlnn

2018-05-02 Thread Zheng, Da
There might be a race condition that causes the memory error.
It might be caused by this PR:
https://github.com/apache/incubator-mxnet/pull/10706/files
This PR removes MKLDNN memory from NDArray.
However, I don't know why this causes memory error. If someone is using the 
memory, it should still hold the memory with shared pointer.
But I do see the memory error increase after this PR is merged.

Best,
Da

On 5/2/18, 12:26 PM, "Pedro Larroy"  wrote:

I couldn't reproduce locally with:

ci/build.py -p ubuntu_cpu /work/runtime_functions.sh
build_ubuntu_cpu_mkldnn && ci/build.py --platform ubuntu_cpu
/work/runtime_functions.sh unittest_ubuntu_python2_cpu


On Wed, May 2, 2018 at 8:50 PM, Pedro Larroy 
wrote:

> Hi
>
> Seems master is not running  anymore, there's a segmentation fault using
> MKDLNN-CPU
>
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
> incubator-mxnet/detail/master/801/pipeline/662
>
>
> I see my PRs failing with a similar error.
>
> Pedro
>




[VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-02 Thread Anirudh
Hi all,

As part of RC2 release, we have addressed bugs and some concerns that were
raised.

I would like to propose a vote to release Apache MXNet (incubating) version
1.2.0.RC2. Voting will start now (Wednesday, May 2nd) and end at 12:50 PM
PDT, Sunday, May 6th.

Link to release notes:
https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.2.0+Release+Notes

Link to release candidate 1.2.0.rc2:
https://github.com/apache/incubator-mxnet/releases/tag/1.2.0.rc2

Voting results for 1.2.0.rc2:
https://lists.apache.org/thread.html/ebe561c609a8e32351dfe4aafc8876199560336472726b58c3455e85@%3Cdev.mxnet.apache.org%3E

View this page, click on "Build from Source", and use the source code
obtained from 1.2.0.rc2 tag:
https://mxnet.incubator.apache.org/install/index.html

(Note: The README.md points to the 1.2.0 tag and does not work at the
moment.)

Please remember to test first before voting accordingly:

+1 = approve
+0 = no opinion
-1 = disapprove (provide reason)

Anirudh


Re: segmentation fault in master using mkdlnn

2018-05-02 Thread Pedro Larroy
I couldn't reproduce locally with:

ci/build.py -p ubuntu_cpu /work/runtime_functions.sh
build_ubuntu_cpu_mkldnn && ci/build.py --platform ubuntu_cpu
/work/runtime_functions.sh unittest_ubuntu_python2_cpu


On Wed, May 2, 2018 at 8:50 PM, Pedro Larroy 
wrote:

> Hi
>
> Seems master is not running  anymore, there's a segmentation fault using
> MKDLNN-CPU
>
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
> incubator-mxnet/detail/master/801/pipeline/662
>
>
> I see my PRs failing with a similar error.
>
> Pedro
>


Re: The New Scala API

2018-05-02 Thread Pedro Larroy
Hi Qing

I didn't know about the type parser, I had a look at the code. Nice idea !

Pedro

On Wed, May 2, 2018 at 8:57 PM, Qing Lan  wrote:

> Hi Pedro,
>
> Thanks for your comments!
>
> For the first one: please see the definition for attr here:
> https://mxnet.incubator.apache.org/api/python/symbol/
> symbol.html?highlight=attr#mxnet.symbol.Symbol.attr. It is a user defined
> Struct.  Our old API indeed using null instead of Option + None
> configuration and the new API will go Option + None.
>
> For Second one, the new API provide the type for attribute:
> e.g: def Activation(data : Option[Symbol], act_type : String, name :
> Option[String], attr : Option[Map[String, String]])
> def Pad(data : Option[Symbol] = None, mode : String, pad_width : Shape,
> constant_value : Option[Double] = None, name : Option[String] = None, attr
> : Option[Map[String, String]] = None)
> And you are right, that's exactly we are working on now, make a parser for
> Scala to C++ Type, see here:
> https://github.com/lanking520/incubator-mxnet/blob/scala-
> macros-impl/scala-package/macros/src/main/scala/org/
> apache/mxnet/SymbolMacro.scala#L147
>
> Thanks,
> Qing
>
> On 5/2/18, 3:50 AM, "Pedro Larroy"  wrote:
>
> Hi
>
> I had a brief look and I have some comments:
>
> 1 - Excessive use of Option: Having an optional map is often not
> necessary.
> What's the semantic difference between an empty map and passing None?
> What
> about a variable argument list of Pairs like:
>
> (attr: Pair[String,Sring]*) it can be then called like ("lr" ->
> "0.01",
> "activation" -> "relu")   with variable number of arguments.
>
> null is definitely bad, but we can have an empty map or other default
> arguments that make sense in some cases, like an empty string for name
> instead of an optional string. I'm very much pro Option instead of
> null but
> it adds clunkiness that is not always necessary, as explained.
>
> 2 - One of the fundamental problems is that the arguments are untyped,
> of
> type String -> String, I don't see this solved in this proposal. Is it
> possible to do something better, ie. have typed arguments? when we
> extend
> dmlc::Parameter we actually know the C++ type, we could do something
> smart
> about this. For example using clang to extract the types and generate
> the
> API with proper types.
>
>
> Pedro.
>
> On Wed, May 2, 2018 at 4:01 AM, Naveen Swamy 
> wrote:
>
> > No, the implementation still comes from MXNet backend.
> > I don't think there would be any overhead unlike C++ with virtual
> > functions, Interface or trait is just telling you what the signature
> looks
> > like. users program against interfaces and can choose a different
> > implementation if they like. For example(only to explain the topic)
> if we
> > had different implementations for CPU and GPU, they could simply
> choose the
> > version of Implementation they like. To visualize think of a bridge
> design
> > pattern.
> >
> >
> >
> > On Tue, May 1, 2018 at 6:49 PM, Marco de Abreu <
> > marco.g.ab...@googlemail.com
> > > wrote:
> >
> > > Hey Naveen,
> > >
> > > I'm not familiar with Scala, so please correct me if I'm wrong
> with my
> > > assumptions. This basically sounds like we'd have an interface for
> the
> > > Scala API and a separate implementation, right?
> > >
> > > In general, I really like these decoupled approaches (you already
> > > highlighted the advantages)! It would introduce a small overhead
> since
> > we'd
> > > have to define the traits and implementation at two (different)
> > locations,
> > > but it definitely increases clarity and separates concerns. I
> don't think
> > > we will have multiple implementations in future, but this might
> allow us
> > to
> > > define the user-facing API in form of traits while allowing us to
> change
> > > the underlying "glue" as we want.
> > >
> > > My only slight concern is that this would introduce a bit more
> overhead
> > > since the documentation and API are defined somewhere else, but I
> have no
> > > experience with Scala, so I'll leave this up to the people who are
> more
> > > familiar with the topic.
> > >
> > > Best regards,
> > > Marco
> > >
> > > On Tue, May 1, 2018 at 11:48 PM, Naveen Swamy 
> > wrote:
> > >
> > > > I think this project is going to significantly improve the
> usability of
> > > > MXNet-Scala APIs.
> > > >
> > > > I had a offline discussion with Qing about this. I agree with
> Yizhi
> > that
> > > > keeping in the same namespace will make it easy for existing and
> new
> > > users
> > > > to quickly lookup the APIs. Along with that I have one
> recommendation,
> > > > first generate a trait for all the APIs and have the the
> instance in
> > the
> > > > existing symbol. 

Re: The New Scala API

2018-05-02 Thread Qing Lan
Hi Pedro,

Thanks for your comments!

For the first one: please see the definition for attr here: 
https://mxnet.incubator.apache.org/api/python/symbol/symbol.html?highlight=attr#mxnet.symbol.Symbol.attr.
 It is a user defined Struct.  Our old API indeed using null instead of Option 
+ None configuration and the new API will go Option + None.

For Second one, the new API provide the type for attribute: 
e.g: def Activation(data : Option[Symbol], act_type : String, name : 
Option[String], attr : Option[Map[String, String]])
def Pad(data : Option[Symbol] = None, mode : String, pad_width : Shape, 
constant_value : Option[Double] = None, name : Option[String] = None, attr : 
Option[Map[String, String]] = None)
And you are right, that's exactly we are working on now, make a parser for 
Scala to C++ Type, see here: 
https://github.com/lanking520/incubator-mxnet/blob/scala-macros-impl/scala-package/macros/src/main/scala/org/apache/mxnet/SymbolMacro.scala#L147

Thanks,
Qing

On 5/2/18, 3:50 AM, "Pedro Larroy"  wrote:

Hi

I had a brief look and I have some comments:

1 - Excessive use of Option: Having an optional map is often not necessary.
What's the semantic difference between an empty map and passing None? What
about a variable argument list of Pairs like:

(attr: Pair[String,Sring]*) it can be then called like ("lr" -> "0.01",
"activation" -> "relu")   with variable number of arguments.

null is definitely bad, but we can have an empty map or other default
arguments that make sense in some cases, like an empty string for name
instead of an optional string. I'm very much pro Option instead of null but
it adds clunkiness that is not always necessary, as explained.

2 - One of the fundamental problems is that the arguments are untyped, of
type String -> String, I don't see this solved in this proposal. Is it
possible to do something better, ie. have typed arguments? when we extend
dmlc::Parameter we actually know the C++ type, we could do something smart
about this. For example using clang to extract the types and generate the
API with proper types.


Pedro.

On Wed, May 2, 2018 at 4:01 AM, Naveen Swamy  wrote:

> No, the implementation still comes from MXNet backend.
> I don't think there would be any overhead unlike C++ with virtual
> functions, Interface or trait is just telling you what the signature looks
> like. users program against interfaces and can choose a different
> implementation if they like. For example(only to explain the topic) if we
> had different implementations for CPU and GPU, they could simply choose 
the
> version of Implementation they like. To visualize think of a bridge design
> pattern.
>
>
>
> On Tue, May 1, 2018 at 6:49 PM, Marco de Abreu <
> marco.g.ab...@googlemail.com
> > wrote:
>
> > Hey Naveen,
> >
> > I'm not familiar with Scala, so please correct me if I'm wrong with my
> > assumptions. This basically sounds like we'd have an interface for the
> > Scala API and a separate implementation, right?
> >
> > In general, I really like these decoupled approaches (you already
> > highlighted the advantages)! It would introduce a small overhead since
> we'd
> > have to define the traits and implementation at two (different)
> locations,
> > but it definitely increases clarity and separates concerns. I don't 
think
> > we will have multiple implementations in future, but this might allow us
> to
> > define the user-facing API in form of traits while allowing us to change
> > the underlying "glue" as we want.
> >
> > My only slight concern is that this would introduce a bit more overhead
> > since the documentation and API are defined somewhere else, but I have 
no
> > experience with Scala, so I'll leave this up to the people who are more
> > familiar with the topic.
> >
> > Best regards,
> > Marco
> >
> > On Tue, May 1, 2018 at 11:48 PM, Naveen Swamy 
> wrote:
> >
> > > I think this project is going to significantly improve the usability 
of
> > > MXNet-Scala APIs.
> > >
> > > I had a offline discussion with Qing about this. I agree with Yizhi
> that
> > > keeping in the same namespace will make it easy for existing and new
> > users
> > > to quickly lookup the APIs. Along with that I have one recommendation,
> > > first generate a trait for all the APIs and have the the instance in
> the
> > > existing symbol. The usage would be something like below:
> > >
> > > trait SymbolAPIBase {
> > > /**
> > > api doc
> > > */
> > > def Activation()
> > > }
> > >
> > > object SymbolAPI extends SymbolAPIBase {
> > > def Activation()
> > > }
> > >
> > > object Symbol {
> > >  val api: SymbolBase = new SymbolAPI()
 

segmentation fault in master using mkdlnn

2018-05-02 Thread Pedro Larroy
Hi

Seems master is not running  anymore, there's a segmentation fault using
MKDLNN-CPU

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/801/pipeline/662


I see my PRs failing with a similar error.

Pedro


Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC1

2018-05-02 Thread Zheng, Da
I have to agree that the DMLC subrepos do make the development much more 
difficult sometimes.

On 5/2/18, 3:57 AM, "Pedro Larroy"  wrote:

For me the situation with DMLC is problematic.

I often find myself having to fix things in the DMLC subrepos.

* These changes are impossible to test with the MXNet CI system without
doing shenanigans like changing the submodules to my own forks.
* Slows down development as fixes need to be merged on the DMLC subrepo
first, then MXNet updated with the new subrepo commit, effectively doubling
the turnaround time for any fix / commit.
* The DMLC subrepos are in a different organization and subject to
different set of committers and rules.

I think we should have a plan forward to have MXNet be more self contained,
so it's faster and easier to contribute.

Pedro.


On Tue, May 1, 2018 at 1:25 AM, Hen  wrote:

> Naive question - How often does this happen? It should be rare that a
> project needs to send a PR to a dependency, and much rarer that it blocks 
a
> release.
>
> It sounds like it’s too coupled a dependency. I suspect the DMLC code
> situation is going to be a major blocker to any chance of graduating.
>
> Hen
>
> On Mon, Apr 30, 2018 at 1:23 PM Gautam  wrote:
>
> > +1 on merging this "https://github.com/dmlc/mshadow/pull/319 "
> >
> > I am curious how are we tracking the sub-modules's PRs which are really
> > important for MXNet?
> > This PR has been waiting to merge for almost 4 months.
> >
> >
> >
> > On Mon, Apr 30, 2018 at 1:51 AM, Pedro Larroy <
> > pedro.larroy.li...@gmail.com>
> > wrote:
> >
> > > -1
> > >
> > > We should merge this and update mshadow before the next release:
> > > https://github.com/dmlc/mshadow/pull/319  so we compile cuda for
> Volta.
> > >
> > > On Sat, Apr 28, 2018 at 12:53 AM, Steffen Rochel <
> > steffenroc...@gmail.com>
> > > wrote:
> > >
> > > > Hi Chris - acknowledge that building the docs is not as good as it
> > should
> > > > be and needs to be improved. Is it worse compared to 1.1.0 release 
to
> > > > consider as release blocker?
> > > >
> > > >
> > > > On Fri, Apr 27, 2018 at 9:53 AM Chris Olivier  >
> > > > wrote:
> > > >
> > > > > -1
> > > > >
> > > > > Building the docs locally is an absolute nightmare.  I haven;t 
been
> > > able
> > > > to
> > > > > get it to work yet.
> > > > >
> > > > > On Thu, Apr 26, 2018 at 3:36 PM, Marco de Abreu <
> > > > > marco.g.ab...@googlemail.com> wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I'd like to request to pause this vote since I have identified 
an
> > > issue
> > > > > > with our CMakeLists, breaking all UNIX builds with the latest
> > version
> > > > > > (3.11) of cmake. This issue is tracked at [1]. The PR to fix 
this
> > is
> > > > > > available at [2].
> > > > > >
> > > > > > Best regards,
> > > > > > Marco
> > > > > >
> > > > > >
> > > > > > [1]: https://github.com/apache/incubator-mxnet/issues/10708
> > > > > > [2]: https://github.com/apache/incubator-mxnet/pull/10712
> > > > > >
> > > > > >
> > > > > > On Thu, Apr 26, 2018 at 3:05 PM, Sheng Zha 
> > > wrote:
> > > > > >
> > > > > > > -1 for the following reasons:
> > > > > > >
> > > > > > > 1. due to addition of support for fp16, the build breaks for
> > > windows
> > > > > and
> > > > > > > older version of osx (clang 8 for example). fix is in
> > > > > > > https://github.com/dmlc/mshadow/pull/333
> > > > > > >
> > > > > > > 2. due to addition of quantized fully connected op, cuda 7.5
> > build
> > > is
> > > > > > > broken. Jun Wu is tracking the issue.
> > > > > > >
> > > > > > > -sz
> > > > > > >
> > > > > > > On Thu, Apr 26, 2018 at 3:01 PM, Anirudh <
> anirudh2...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > As part of RC1 release, We have addressed the issue with
> > respect
> > > to
> > > > > > > > asymmetric padding in ONNX Import module (
> > > > > > > > https://github.com/apache/incubator-mxnet/pull/10676).
> > > > > > > > We have also added existing silent failures for MXNet Conv
> and
> > > the
> > > > > > > > incompatibility in behavior for certain use cases between
> MXNet
> > > > > without
> > > > > > > > MKLDNN and MXNet with MKLDNN support to the release notes. 
We
> > > have
> > > > > > marked
> > > > > > > > both ONNX and MKLDNN support as "Experimental" in the 
release
> > > > notes.
> > > > > > The
> > > > > > > > tutorial
> > > > > > > >  > > > > >

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC1

2018-05-02 Thread Marco de Abreu
Totally agree with Henry and Pedro.

On Wed, May 2, 2018 at 12:56 PM, Pedro Larroy 
wrote:

> For me the situation with DMLC is problematic.
>
> I often find myself having to fix things in the DMLC subrepos.
>
> * These changes are impossible to test with the MXNet CI system without
> doing shenanigans like changing the submodules to my own forks.
> * Slows down development as fixes need to be merged on the DMLC subrepo
> first, then MXNet updated with the new subrepo commit, effectively doubling
> the turnaround time for any fix / commit.
> * The DMLC subrepos are in a different organization and subject to
> different set of committers and rules.
>
> I think we should have a plan forward to have MXNet be more self contained,
> so it's faster and easier to contribute.
>
> Pedro.
>
>
> On Tue, May 1, 2018 at 1:25 AM, Hen  wrote:
>
> > Naive question - How often does this happen? It should be rare that a
> > project needs to send a PR to a dependency, and much rarer that it
> blocks a
> > release.
> >
> > It sounds like it’s too coupled a dependency. I suspect the DMLC code
> > situation is going to be a major blocker to any chance of graduating.
> >
> > Hen
> >
> > On Mon, Apr 30, 2018 at 1:23 PM Gautam  wrote:
> >
> > > +1 on merging this "https://github.com/dmlc/mshadow/pull/319 "
> > >
> > > I am curious how are we tracking the sub-modules's PRs which are really
> > > important for MXNet?
> > > This PR has been waiting to merge for almost 4 months.
> > >
> > >
> > >
> > > On Mon, Apr 30, 2018 at 1:51 AM, Pedro Larroy <
> > > pedro.larroy.li...@gmail.com>
> > > wrote:
> > >
> > > > -1
> > > >
> > > > We should merge this and update mshadow before the next release:
> > > > https://github.com/dmlc/mshadow/pull/319  so we compile cuda for
> > Volta.
> > > >
> > > > On Sat, Apr 28, 2018 at 12:53 AM, Steffen Rochel <
> > > steffenroc...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Chris - acknowledge that building the docs is not as good as it
> > > should
> > > > > be and needs to be improved. Is it worse compared to 1.1.0 release
> to
> > > > > consider as release blocker?
> > > > >
> > > > >
> > > > > On Fri, Apr 27, 2018 at 9:53 AM Chris Olivier <
> cjolivie...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > -1
> > > > > >
> > > > > > Building the docs locally is an absolute nightmare.  I haven;t
> been
> > > > able
> > > > > to
> > > > > > get it to work yet.
> > > > > >
> > > > > > On Thu, Apr 26, 2018 at 3:36 PM, Marco de Abreu <
> > > > > > marco.g.ab...@googlemail.com> wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > I'd like to request to pause this vote since I have identified
> an
> > > > issue
> > > > > > > with our CMakeLists, breaking all UNIX builds with the latest
> > > version
> > > > > > > (3.11) of cmake. This issue is tracked at [1]. The PR to fix
> this
> > > is
> > > > > > > available at [2].
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Marco
> > > > > > >
> > > > > > >
> > > > > > > [1]: https://github.com/apache/incubator-mxnet/issues/10708
> > > > > > > [2]: https://github.com/apache/incubator-mxnet/pull/10712
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Apr 26, 2018 at 3:05 PM, Sheng Zha  >
> > > > wrote:
> > > > > > >
> > > > > > > > -1 for the following reasons:
> > > > > > > >
> > > > > > > > 1. due to addition of support for fp16, the build breaks for
> > > > windows
> > > > > > and
> > > > > > > > older version of osx (clang 8 for example). fix is in
> > > > > > > > https://github.com/dmlc/mshadow/pull/333
> > > > > > > >
> > > > > > > > 2. due to addition of quantized fully connected op, cuda 7.5
> > > build
> > > > is
> > > > > > > > broken. Jun Wu is tracking the issue.
> > > > > > > >
> > > > > > > > -sz
> > > > > > > >
> > > > > > > > On Thu, Apr 26, 2018 at 3:01 PM, Anirudh <
> > anirudh2...@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > As part of RC1 release, We have addressed the issue with
> > > respect
> > > > to
> > > > > > > > > asymmetric padding in ONNX Import module (
> > > > > > > > > https://github.com/apache/incubator-mxnet/pull/10676).
> > > > > > > > > We have also added existing silent failures for MXNet Conv
> > and
> > > > the
> > > > > > > > > incompatibility in behavior for certain use cases between
> > MXNet
> > > > > > without
> > > > > > > > > MKLDNN and MXNet with MKLDNN support to the release notes.
> We
> > > > have
> > > > > > > marked
> > > > > > > > > both ONNX and MKLDNN support as "Experimental" in the
> release
> > > > > notes.
> > > > > > > The
> > > > > > > > > tutorial
> > > > > > > > >  > > > > > > > > docs/tutorials/onnx/fine_tuning_gluon.md>
> > > > > > > > > which was called out still seems to be failing with cpu
> > context
> > > > and
> > > > > > we
> > > > > > > > have
> > > > > > > > > mentioned this as a known issue in the release notes.
> Since,
> > > both
> > > > > > >

Re: Slack access

2018-05-02 Thread Pedro Larroy
Hi Jesse

Welcome!   Have you seen the "contribute" documentation?
https://mxnet.incubator.apache.org/community/contribute.html

Should be easy to contribute via github issues, Jira ticket and a PR.

Pedro.

On Wed, May 2, 2018 at 3:47 AM, jesse brizzi  wrote:

> Hey everyone,
>
> I'm looking to get slack access and hopefully offer my help with
> contributions/development.
>
> I work for a startup that uses Mxnet for some of our internal R&D and deep
> learning based services, specifically the scala and python interfaces, and
> have run into a few bugs/issues that we could probably help with.
>
> Jesse
>


Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC1

2018-05-02 Thread Pedro Larroy
For me the situation with DMLC is problematic.

I often find myself having to fix things in the DMLC subrepos.

* These changes are impossible to test with the MXNet CI system without
doing shenanigans like changing the submodules to my own forks.
* Slows down development as fixes need to be merged on the DMLC subrepo
first, then MXNet updated with the new subrepo commit, effectively doubling
the turnaround time for any fix / commit.
* The DMLC subrepos are in a different organization and subject to
different set of committers and rules.

I think we should have a plan forward to have MXNet be more self contained,
so it's faster and easier to contribute.

Pedro.


On Tue, May 1, 2018 at 1:25 AM, Hen  wrote:

> Naive question - How often does this happen? It should be rare that a
> project needs to send a PR to a dependency, and much rarer that it blocks a
> release.
>
> It sounds like it’s too coupled a dependency. I suspect the DMLC code
> situation is going to be a major blocker to any chance of graduating.
>
> Hen
>
> On Mon, Apr 30, 2018 at 1:23 PM Gautam  wrote:
>
> > +1 on merging this "https://github.com/dmlc/mshadow/pull/319 "
> >
> > I am curious how are we tracking the sub-modules's PRs which are really
> > important for MXNet?
> > This PR has been waiting to merge for almost 4 months.
> >
> >
> >
> > On Mon, Apr 30, 2018 at 1:51 AM, Pedro Larroy <
> > pedro.larroy.li...@gmail.com>
> > wrote:
> >
> > > -1
> > >
> > > We should merge this and update mshadow before the next release:
> > > https://github.com/dmlc/mshadow/pull/319  so we compile cuda for
> Volta.
> > >
> > > On Sat, Apr 28, 2018 at 12:53 AM, Steffen Rochel <
> > steffenroc...@gmail.com>
> > > wrote:
> > >
> > > > Hi Chris - acknowledge that building the docs is not as good as it
> > should
> > > > be and needs to be improved. Is it worse compared to 1.1.0 release to
> > > > consider as release blocker?
> > > >
> > > >
> > > > On Fri, Apr 27, 2018 at 9:53 AM Chris Olivier  >
> > > > wrote:
> > > >
> > > > > -1
> > > > >
> > > > > Building the docs locally is an absolute nightmare.  I haven;t been
> > > able
> > > > to
> > > > > get it to work yet.
> > > > >
> > > > > On Thu, Apr 26, 2018 at 3:36 PM, Marco de Abreu <
> > > > > marco.g.ab...@googlemail.com> wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I'd like to request to pause this vote since I have identified an
> > > issue
> > > > > > with our CMakeLists, breaking all UNIX builds with the latest
> > version
> > > > > > (3.11) of cmake. This issue is tracked at [1]. The PR to fix this
> > is
> > > > > > available at [2].
> > > > > >
> > > > > > Best regards,
> > > > > > Marco
> > > > > >
> > > > > >
> > > > > > [1]: https://github.com/apache/incubator-mxnet/issues/10708
> > > > > > [2]: https://github.com/apache/incubator-mxnet/pull/10712
> > > > > >
> > > > > >
> > > > > > On Thu, Apr 26, 2018 at 3:05 PM, Sheng Zha 
> > > wrote:
> > > > > >
> > > > > > > -1 for the following reasons:
> > > > > > >
> > > > > > > 1. due to addition of support for fp16, the build breaks for
> > > windows
> > > > > and
> > > > > > > older version of osx (clang 8 for example). fix is in
> > > > > > > https://github.com/dmlc/mshadow/pull/333
> > > > > > >
> > > > > > > 2. due to addition of quantized fully connected op, cuda 7.5
> > build
> > > is
> > > > > > > broken. Jun Wu is tracking the issue.
> > > > > > >
> > > > > > > -sz
> > > > > > >
> > > > > > > On Thu, Apr 26, 2018 at 3:01 PM, Anirudh <
> anirudh2...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > As part of RC1 release, We have addressed the issue with
> > respect
> > > to
> > > > > > > > asymmetric padding in ONNX Import module (
> > > > > > > > https://github.com/apache/incubator-mxnet/pull/10676).
> > > > > > > > We have also added existing silent failures for MXNet Conv
> and
> > > the
> > > > > > > > incompatibility in behavior for certain use cases between
> MXNet
> > > > > without
> > > > > > > > MKLDNN and MXNet with MKLDNN support to the release notes. We
> > > have
> > > > > > marked
> > > > > > > > both ONNX and MKLDNN support as "Experimental" in the release
> > > > notes.
> > > > > > The
> > > > > > > > tutorial
> > > > > > > >  > > > > > > > docs/tutorials/onnx/fine_tuning_gluon.md>
> > > > > > > > which was called out still seems to be failing with cpu
> context
> > > and
> > > > > we
> > > > > > > have
> > > > > > > > mentioned this as a known issue in the release notes. Since,
> > both
> > > > > > MKLDNN
> > > > > > > > support and ONNX import module have been marked experimental
> > this
> > > > > > should
> > > > > > > > not be a blocking issue.
> > > > > > > >
> > > > > > > > I would like to propose a vote to release Apache MXNet
> > > (incubating)
> > > > > > > version
> > > > > > > > 1.2.0.RC1. Voting will start now (Thursday, April 26th) and
> end
> > > at
> > > > > 3:00
> > > > > > > PM
> > > > > >

Re: The New Scala API

2018-05-02 Thread Pedro Larroy
Hi

I had a brief look and I have some comments:

1 - Excessive use of Option: Having an optional map is often not necessary.
What's the semantic difference between an empty map and passing None? What
about a variable argument list of Pairs like:

(attr: Pair[String,Sring]*) it can be then called like ("lr" -> "0.01",
"activation" -> "relu")   with variable number of arguments.

null is definitely bad, but we can have an empty map or other default
arguments that make sense in some cases, like an empty string for name
instead of an optional string. I'm very much pro Option instead of null but
it adds clunkiness that is not always necessary, as explained.

2 - One of the fundamental problems is that the arguments are untyped, of
type String -> String, I don't see this solved in this proposal. Is it
possible to do something better, ie. have typed arguments? when we extend
dmlc::Parameter we actually know the C++ type, we could do something smart
about this. For example using clang to extract the types and generate the
API with proper types.


Pedro.

On Wed, May 2, 2018 at 4:01 AM, Naveen Swamy  wrote:

> No, the implementation still comes from MXNet backend.
> I don't think there would be any overhead unlike C++ with virtual
> functions, Interface or trait is just telling you what the signature looks
> like. users program against interfaces and can choose a different
> implementation if they like. For example(only to explain the topic) if we
> had different implementations for CPU and GPU, they could simply choose the
> version of Implementation they like. To visualize think of a bridge design
> pattern.
>
>
>
> On Tue, May 1, 2018 at 6:49 PM, Marco de Abreu <
> marco.g.ab...@googlemail.com
> > wrote:
>
> > Hey Naveen,
> >
> > I'm not familiar with Scala, so please correct me if I'm wrong with my
> > assumptions. This basically sounds like we'd have an interface for the
> > Scala API and a separate implementation, right?
> >
> > In general, I really like these decoupled approaches (you already
> > highlighted the advantages)! It would introduce a small overhead since
> we'd
> > have to define the traits and implementation at two (different)
> locations,
> > but it definitely increases clarity and separates concerns. I don't think
> > we will have multiple implementations in future, but this might allow us
> to
> > define the user-facing API in form of traits while allowing us to change
> > the underlying "glue" as we want.
> >
> > My only slight concern is that this would introduce a bit more overhead
> > since the documentation and API are defined somewhere else, but I have no
> > experience with Scala, so I'll leave this up to the people who are more
> > familiar with the topic.
> >
> > Best regards,
> > Marco
> >
> > On Tue, May 1, 2018 at 11:48 PM, Naveen Swamy 
> wrote:
> >
> > > I think this project is going to significantly improve the usability of
> > > MXNet-Scala APIs.
> > >
> > > I had a offline discussion with Qing about this. I agree with Yizhi
> that
> > > keeping in the same namespace will make it easy for existing and new
> > users
> > > to quickly lookup the APIs. Along with that I have one recommendation,
> > > first generate a trait for all the APIs and have the the instance in
> the
> > > existing symbol. The usage would be something like below:
> > >
> > > trait SymbolAPIBase {
> > > /**
> > > api doc
> > > */
> > > def Activation()
> > > }
> > >
> > > object SymbolAPI extends SymbolAPIBase {
> > > def Activation()
> > > }
> > >
> > > object Symbol {
> > >  val api: SymbolBase = new SymbolAPI()
> > > }
> > >
> > > Now users would use it as
> > > Symbol.api.Activation()
> > >
> > > Creating a trait/Interface as several advantages:
> > > 1) User do not have to know which concrete class to use, can be decided
> > > later(Dependency Injection or Configuration driven application)
> > > 2) Easier to change the implementation later without breaking the user
> > code
> > > -> You can insert different implementations through configuration using
> > > frameworks such as Spring, etc., -> This is standard in most JVM driven
> > > projects
> > > 3) Easier to Unit test -> You can create Mocks easily using Mockito
> > > 4) helps with multiple inheritance…you cannot inherit multiple classes
> > >
> > > Let me know what do you guys think.
> > >
> > > Thanks, Naveen
> > >
> > > On Sun, Apr 22, 2018 at 9:09 PM, YiZhi Liu 
> wrote:
> > >
> > > > I personally like the design here. Since we have seen technical
> > > > difficulties of compatibility, I would like to ask people pay
> > > > attention to the 'How to combine with existing APIs' section:
> > > > https://cwiki.apache.org/confluence/display/MXNET/
> > > > MXNet+Scala+API+Usability+Improvement#MXNetScalaAPIUsabilityImprovem
> > ent-
> > > > HowtocombinewithexistingAPIs
> > > >
> > > > Qing proposed three options,
> > > >
> > > > 1. Add a new Class/Object called "NewSymbol/NDArray" with full
> > > > implementation.
> > > > 2. Create a new Class and