Re: [mlpack] Out of memory error

2018-08-23 Thread Ryan Curtin
On Thu, Aug 23, 2018 at 03:06:48PM +0200, Anisa Llaveshi wrote:
> Dear Ryan,
> 
> Thank you for your quick response. I double checked and I am afraid that is
> not the case. The matrix that I am providing as input to the model is a 1xN
> matrix. I explicitly check the number of rows (1) and the number of columns
> (N) of the matrix. The model would not even allow that because the labels
> provided to the model is a row vector (rowvec) of N columns so the
> dimensions would not match.
> 
> Please let me know in case you think I am missing something or if you have
> another idea of what might be going on.

Hi Anisa,

It turns out you aren't missing anything at all.  I took a look at the
linear regression implementation and was surprised to see that it
actually is trying to form an NxN matrix, whereas for linear regression
it should only be necessary to form a dxd matrix.

I spent a little time this evening reimplementing the code and have
opened a pull request to fix it:

https://github.com/mlpack/mlpack/pull/1500

You can either work directly off that branch, wait for another release,
or copy the modified linear_regression.cpp file into your codebase.

But now for your problem with 200k points, it should be blazing fast
(like 0.005s to solve the system, minus data loading time).  So you
should be able to scale much more.

Sorry for the error!  I hope that this helps out, and if not, just let
me know.

Thanks,

Ryan

-- 
Ryan Curtin| "The enemy cannot press a button... if you have
r...@ratml.org | disabled his hand." - Sgt. Zim
___
mlpack mailing list
mlpack@lists.mlpack.org
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack


Re: [mlpack] Out of memory error

2018-08-23 Thread Anisa Llaveshi
Dear Ryan,

Thank you for your quick response. I double checked and I am afraid that is
not the case. The matrix that I am providing as input to the model is a 1xN
matrix. I explicitly check the number of rows (1) and the number of columns
(N) of the matrix. The model would not even allow that because the labels
provided to the model is a row vector (rowvec) of N columns so the
dimensions would not match.

Please let me know in case you think I am missing something or if you have
another idea of what might be going on.

Best regards,
Anisa Llaveshi

On Tue, Aug 21, 2018 at 10:39 PM Ryan Curtin  wrote:

> On Mon, Aug 20, 2018 at 10:52:15AM +0200, Anisa Llaveshi wrote:
> > Greetings,
> >
> > I have recently started using mlpack for a C++ application and I came
> > across a problem that I haven't been able to solve. I am using Linear
> > Regression to learn the parameters of a linear model. My training data
> is a
> > vector of 1-dimensional points. It consists of a vector of type double
> > (64-bit). I initialize the data points matrix from a std::vector
> structure
> > (where I have the data) using this constructor: arma::mat(std::vector)
> > Depending on the datasize of the dataset that I use to create the Linear
> > Regression model I get this error:
> >
> >
> > > error: arma::memory::acquire(): out of memory
> > > terminate called after throwing an instance of 'std::bad_alloc'
> > >   what():  std::bad_alloc
> > >
> >
> > I am running the application on a machine which has 250GB of memory.
> When I
> > use 100k points (less then 1MB of data) I observe that ~28% of the memory
> > is being used to build the model. When I increase this number to 160k
> > points I observe that ~50% of the memory is being used and then the
> process
> > is killed. When I increase it a bit more the above error is immediately
> > thrown when trying to build the model.
> > I was wondering whether it is normal for the model to consume this much
> > memory for a small amount of data and if this is the case then what can
> one
> > do to use a larger dataset?
>
> Hi Anisa,
>
> The memory used by a linear regression should be dxd, where d is the
> number of dimensions in your data.  Remember that with mlpack, each
> observation (or point) in your data should correspond to one
> column---not one row---since Armadillo is column major. Here's some more
> information:
>
> https://www.mlpack.org/docs/mlpack-3.0.3/doxygen/matrices.html
>
> So, my guess is that your matrix currently has N rows and 1 column,
> causing mlpack to try and invert an NxN matrix (very large), but what
> you actually want is a matrix with 1 row and N columns.
>
> I hope this helps! Let me know if I can clarify anything.
>
> Thanks,
>
> Ryan
>
> --
> Ryan Curtin| "Plot is a primitive vulgarity in literature."
> r...@ratml.org |   - Balph Eubank
>
___
mlpack mailing list
mlpack@lists.mlpack.org
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack