RE: [RFC] Support for creation of Large Tensors in MXNet

Lv, Tao A Sat, 25 May 2019 03:15:32 -0700

Hi Lin,

Yes, MKL supports that. Please refer to 
https://software.intel.com/en-us/mkl-macos-developer-guide-using-the-ilp64-interface-vs-lp64-interface
 for details.


I also did some work towards that direction. Please see below PRs for MXNet and 
mshadow respectively.
https://github.com/apache/incubator-mxnet/pull/13723
https://github.com/dmlc/mshadow/pull/365

Feel free to let me know if anything I can help.

Thanks,
-tao


-----Original Message-----
From: Lin Yuan [mailto:apefor...@gmail.com] 
Sent: Saturday, May 25, 2019 1:36 AM
To: dev@mxnet.incubator.apache.org; Lv, Tao A <tao.a...@intel.com>
Cc: d...@mxnet.apache.org
Subject: Re: [RFC] Support for creation of Large Tensors in MXNet

Hi Sheng,

Thanks for the nice suggestions. To summarize the current status and future 
plan of this project:

There were some missing operators from #11742 that did not support large 
tensors. Thanbks to Rohit's help, those missing operators have been completed 
and tests added to nightly pipeline in MXNet 1.5 release (currently on GPU only 
and will be added to CPU once issue
https://github.com/apache/incubator-mxnet/issues/14980 is resolved) .

The next phases of this project are:
(1) Run operator profling to identify the operators that have performance 
regression after turning on int64 compiler flag
(2) Mitigate the performance regressions in the operators collected from (1)
(3) Turn on int64 compilation flag by default (the target completion is release 
1.6)
(4) Support int64 for each dimension of the tensor. This can be carried on in 
parallel with (1) to (3). The currently limitation AFAIK is the cblas_gemm 
libraries which uses int32 for each dimension and a lot of matrix operators in 
MXNet is calling cblas_gemm in mshadow.
@Lv, Tao A <tao.a...@intel.com> Does Intel MKL Cblas library support int64 for 
each dimension? Thanks!

Best,

Lin




On Sat, May 18, 2019 at 9:05 PM Sheng Zha <zhash...@apache.org> wrote:

> Thanks for clarifying. This seems like a duplicate of [1] (though 
> there wasn't any feedback there). I think everyone already agrees on the goal.
>
> > Currently, we assume the max size of each dimension.
>
> I agree with Tao that int64_t would be necessary given that it's 
> common to flatten and reshape ndarrays.
>
> To help avoid repeating discussion and to make this discussion more 
> productive, here are some of the relevant context that I'm aware of:
> - The first part of the proposed change was merged in #11742 which 
> caused #14496, i.e. performance degredation in transpose and imdecode. 
> The full scope is still unclear.
> - A compilation flag was added in #14570 so that people can explicitly 
> opt in for the support without impacting others using the default setting.
>
> Given the context, since the goal is to support large tensor by 
> default without performance impact, I hope more investigation could 
> accompany this proposal that covers:
> - The problem: list the parts (e.g. operators) whose performance is 
> impacted by changing the index type, and the amount of slow-down.
> - The solution for addressing the slow-down.
>
> Thanks.
>
> -sz
>
> [1]
> https://lists.apache.org/thread.html/52b784cf85f89a22355e195fc88b01992
> fb1993a6f08499a46fa1ff8@%3Cdev.mxnet.apache.org%3E
>
> On 2019/05/19 02:43:39, "Srivastava, Rohit Kumar" < 
> srivastava....@buckeyemail.osu.edu> wrote:
> > Hi Tao,
> >     Existing MXNet implementation doesn't support large tensors. 
> > MXNet
> NDArray creation for tensors of sizes larger than 2^32 is only 
> supported by enabling a build flag for now. The purpose of this thread 
> is to have the community provide feedback on the design cwiki for 
> *Large Tensor Support* in MXNet. The intension is to make large tensor 
> support as default feature in MXNet (in future) w/o any performance 
> impact so consumers do not have to build it from source.
> >
> > -Rohit
> >
> > On 5/18/19, 5:59 PM, "Lv, Tao A" <tao.a...@intel.com> wrote:
> >
> >     Hi Rohit,
> >
> >     The existing MKL-DNN and its integration in MXNet should already
> support *large tensor* which means the total number of elements
> (Prod(shape)) can exceed INT_MAX. Feel free to me know if you find any 
> issue when using MKL-DNN operators with large tensors.
> >
> >     For large dimension size (shape[x]), MKL-DNN is going to support 
> > in
> its 1.0 release and will be released at the middle of year. But I'm 
> not sure if MXNet has plan to support that.
> >
> >     Thanks,
> >     -tao
> >
> >     -----Original Message-----
> >     From: Srivastava, Rohit Kumar [mailto:
> srivastava....@buckeyemail.osu.edu]
> >     Sent: Sunday, May 19, 2019 7:23 AM
> >     To: dev@mxnet.incubator.apache.org
> >     Subject: Re: [RFC] Support for creation of Large Tensors in 
> > MXNet
> >
> >     Hi Tao,
> >         There are already couple of operators implemented in MXNet 
> > that
> are currently supporting Tensors with size over ~4.5 billion. In the 
> meantime core MXNet can move ahead with providing initial support for 
> such large tensors so MXNet customers can start using it.
> >
> >     Good to hear MKLDNN will provide support for such cases. Do you 
> > have
> a timeline as to when this feature will be released ?
> >
> >     -Rohit
> >
> >     On 4/29/19, 7:18 PM, "Lv, Tao A" <tao.a...@intel.com> wrote:
> >
> >         Thank you Lin! I would expect the current MKL-DNN 
> > implementation
> already supports the scenario you mentioned here. Can be verified by 
> this
> issue: https://github.com/apache/incubator-mxnet/issues/13451
> >
> >         But as I said before, since we support flatten or reshape
> operators, so it's possible for users to convert a tensor with large 
> element size to a tensor with large dimension size. It possibly will 
> cause issue there.
> >
> >         To cover more cases, MKL-DNN is going to support INT64 
> > dimension
> size in its coming 1.0 major release.
> >
> >         -tao
> >
> >         -----Original Message-----
> >         From: Lin Yuan [mailto:apefor...@gmail.com]
> >         Sent: Tuesday, April 30, 2019 12:56 AM
> >         To: dev@mxnet.incubator.apache.org
> >         Subject: Re: [RFC] Support for creation of Large Tensors in 
> > MXNet
> >
> >         Tao,
> >
> >         - what's the max size of dimensionality? Which data type is 
> > used
> to define dimensionality (ndims)?
> >         We assume the max size of dimensionality is relatively small.
> Hence `int` data type is used to define ndim
> >
> >         - what's the max size of each dimension? Which data type is 
> > used
> to define dimension size (shape[x])?
> >         Currently, we assume the max size of each dimension is not 
> > going
> to exceed
> >         2^31 in real applications. Hence the data type is `int32_t`
> >
> >         - what's the max size of total elements? Which data type is 
> > used
> to define element size (Prod(shape))?
> >         We assume the total number of elements in a tensor can be 
> > larger
> than 2^32 in some applications such as deep graph library. We use the 
> data type `int64_t` to represent the total element size. Currently due 
> to performance regression in some operators (such as transpose), we 
> used a compiler flag to set this data type to `int32_t` by default. 
> Once we have ways to mitigate the performance regression, we will set 
> the default data type to `int64_t`, which is part of the effort in 
> this project that Rohit proposed.
> >
> >         What is the plan in MKLDNN to support large tensors? We may 
> > want
> to coordinate the progress since many operators are using MKLDNN 
> implementation in CPU now.
> >
> >         Many Thanks,
> >
> >         Lin
> >
> >         On Sun, Apr 28, 2019 at 7:52 PM Lv, Tao A 
> > <tao.a...@intel.com>
> wrote:
> >
> >         > Thank you for bringing this topic to dev, Rohit.
> >         >
> >         > Regarding large tensor, can you articulate:
> >         > - what's the max size of dimensionality? Which data type 
> > is
> used to
> >         > define dimensionality (ndims)?
> >         > - what's the max size of each dimension? Which data type 
> > is
> used to
> >         > define dimension size (shape[x])?
> >         > - what's the max size of total elements? Which data type 
> > is
> used to
> >         > define element size (Prod(shape))?
> >         >
> >         > For me, any of these three can be *large*.
> >         >
> >         > -----Original Message-----
> >         > From: Srivastava, Rohit Kumar
> >         > [mailto:srivastava....@buckeyemail.osu.edu]
> >         > Sent: Saturday, April 27, 2019 7:33 AM
> >         > To: dev@mxnet.incubator.apache.org
> >         > Subject: [RFC] Support for creation of Large Tensors in MXNet
> >         >
> >         > Dear Community,
> >         >
> >         > Currently MXNet supports creation of Tensors containing up 
> > to
> 2^32
> >         > elements. However there are cases where tensors of size 
> > over 5
> billion
> >         > is required
> >         >
> >         > We plan to support creation of large tensors on MXNet. A
> design
> >         > proposal is ready for review:
> >         >
> https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support
> >         >
> >         > We will appreciate any help and feedbacks from the community.
> >         >
> >         > Thank you!
> >         >
> >         > Rohit
> >         >
> >
> >
> >
> >
> >
>

RE: [RFC] Support for creation of Large Tensors in MXNet

Reply via email to