Re: Local versions of Linear Algebra Operators in DML

2016-10-24 Thread Matthias Boehm
Indeed, some of these operations do allocate additional data structures. 
Other problems were (1) that our memory estimates do not account for the 
explicit copy into commons math data structures (e.g., 
Array2DRowRealMatrix), and (2) unnecessarily raised exceptions due to 
unknowns. However, both issues can be addressed and since we're talking 
about warnings, false positives/negatives are probably ok.


Regards,
Matthias

On 10/24/2016 9:35 PM, Berthold Reinwald wrote:

if I remember correctly then it is not trivial to accurately estimate the
memory foot print for these commons math functions at compile time
depending on what intermediates they produce ... Meaning you may still end
up with java heap space OOM at runtime.

Regards,
Berthold Reinwald
IBM Almaden Research Center
office: (408) 927 2208; T/L: 457 2208
e-mail: reinw...@us.ibm.com



From:   Matthias Boehm <mboe...@googlemail.com>
To: dev@systemml.incubator.apache.org
Date:   10/24/2016 11:54 AM
Subject:    Re: Local versions of Linear Algebra Operators in DML



well, we still compute memory estimates for these operations. So I
guess, a good compromise would be to raise a warning whenever the memory
estimate is known to exceed the local memory budget.

Regards,
Matthias

On 10/24/2016 8:29 PM, Deron Eriksson wrote:

Would it be acceptable for a user to receive a log warning if the user

uses

an operation that is currently only implemented for single node? My

concern

is that there is an expectation for operations to be distributed with
SystemML, and if an operation is not currently distributed, the user

needs

to made aware of this.

Thoughts?

Deron


On Mon, Oct 24, 2016 at 10:38 AM, Nakul Jindal <naku...@gmail.com>

wrote:



Hi,

There is an initial implementation and PR.
https://github.com/apache/incubator-systemml/pull/273

-Nakul



On Oct 24, 2016, at 12:59 AM, Berthold Reinwald <reinw...@us.ibm.com>

wrote:


Thanks, Imran. I think it is a good idea to start off with the

DML-bodied

function implementation. This will hold until we can have a built in
implementation.

We prototyped an implementation of distributed Cholesky as a DML

bodied

function as well. For performance optimization, as the matrix becomes
"small" enough, we switched over and exploit a single node

implementation.


Adding a new svd() built in function that initially routes to a local
library is fine. I don't know whether Apache commons math has an
implementation that can be re-used.

I object renaming the functions or changing the externals. Eventually
distributed instructions need to be added to these implementations,

and

there are open jiras for it.

Regards,
Berthold Reinwald
IBM Almaden Research Center
office: (408) 927 2208; T/L: 457 2208
e-mail: reinw...@us.ibm.com



From:   Niketan Pansare/Almaden/IBM@IBMUS
To: dev@systemml.incubator.apache.org
Date:   10/21/2016 01:14 PM
Subject:    Re: Local versions of Linear Algebra Operators in DML



I am also comfortable with option (2) ... "with a plan to implement

its

distributed version"

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar

Matthias Boehm ---10/21/2016 01:00:51 PM---thanks Nakul for reaching

out

before starting work on this. Actually, the introduction of these CP-

From: Matthias Boehm <mboe...@googlemail.com>
To: dev@systemml.incubator.apache.org
Date: 10/21/2016 01:00 PM
Subject: Re: Local versions of Linear Algebra Operators in DML



thanks Nakul for reaching out before starting work on this. Actually,
the introduction of these CP-only builtin functions was a big mistake
because (as you already mentioned) they mistakenly suggest that we
provide distributed operations for them too. The intend was to support
them in later versions with our own local and distributed
implementations. So far, this had low priority though because these
O(n^3) operations are seldom used over large data. However, a while
back, we lost potential users who were specifically interested in
distributed eigen - so there are still use cases.

Despite the good intentions behind the renaming, I would strongly

argue

against it. First, it would unnecessarily lose compatibility with R
syntax. Second, it would defeat our clean abstraction by exposing
explicit local operations.

This leaves us with two options here: (1) you could use an external
(java-implemented) function, which gives you virtually the same

runtime

behavior but a clear separation via an explicit registration, or (2)

add

it to the list of CP-only operations (with a plan to implement its
distributed version) but name it 'svd' as in R.


Regards,
Matthias



On 10/21/2016 9:34 PM, Nakul Jindal wrote:
Hi,

Imran was planning on implementing a distributed SVD as a DML bodied
function.
The algorithm is described in the paper titled "A Distributed and
Incremental SVD Algorithm for Agglome

Re: Local versions of Linear Algebra Operators in DML

2016-10-24 Thread Berthold Reinwald
if I remember correctly then it is not trivial to accurately estimate the 
memory foot print for these commons math functions at compile time 
depending on what intermediates they produce ... Meaning you may still end 
up with java heap space OOM at runtime.

Regards,
Berthold Reinwald
IBM Almaden Research Center
office: (408) 927 2208; T/L: 457 2208
e-mail: reinw...@us.ibm.com



From:   Matthias Boehm <mboe...@googlemail.com>
To: dev@systemml.incubator.apache.org
Date:   10/24/2016 11:54 AM
Subject:    Re: Local versions of Linear Algebra Operators in DML



well, we still compute memory estimates for these operations. So I 
guess, a good compromise would be to raise a warning whenever the memory 
estimate is known to exceed the local memory budget.

Regards,
Matthias

On 10/24/2016 8:29 PM, Deron Eriksson wrote:
> Would it be acceptable for a user to receive a log warning if the user 
uses
> an operation that is currently only implemented for single node? My 
concern
> is that there is an expectation for operations to be distributed with
> SystemML, and if an operation is not currently distributed, the user 
needs
> to made aware of this.
>
> Thoughts?
>
> Deron
>
>
> On Mon, Oct 24, 2016 at 10:38 AM, Nakul Jindal <naku...@gmail.com> 
wrote:
>
>> Hi,
>>
>> There is an initial implementation and PR.
>> https://github.com/apache/incubator-systemml/pull/273
>>
>> -Nakul
>>
>>
>>> On Oct 24, 2016, at 12:59 AM, Berthold Reinwald <reinw...@us.ibm.com>
>> wrote:
>>>
>>> Thanks, Imran. I think it is a good idea to start off with the 
DML-bodied
>>> function implementation. This will hold until we can have a built in
>>> implementation.
>>>
>>> We prototyped an implementation of distributed Cholesky as a DML 
bodied
>>> function as well. For performance optimization, as the matrix becomes
>>> "small" enough, we switched over and exploit a single node
>> implementation.
>>>
>>> Adding a new svd() built in function that initially routes to a local
>>> library is fine. I don't know whether Apache commons math has an
>>> implementation that can be re-used.
>>>
>>> I object renaming the functions or changing the externals. Eventually
>>> distributed instructions need to be added to these implementations, 
and
>>> there are open jiras for it.
>>>
>>> Regards,
>>> Berthold Reinwald
>>> IBM Almaden Research Center
>>> office: (408) 927 2208; T/L: 457 2208
>>> e-mail: reinw...@us.ibm.com
>>>
>>>
>>>
>>> From:   Niketan Pansare/Almaden/IBM@IBMUS
>>> To: dev@systemml.incubator.apache.org
>>> Date:   10/21/2016 01:14 PM
>>> Subject:Re: Local versions of Linear Algebra Operators in DML
>>>
>>>
>>>
>>> I am also comfortable with option (2) ... "with a plan to implement 
its
>>> distributed version"
>>>
>>> Thanks,
>>>
>>> Niketan Pansare
>>> IBM Almaden Research Center
>>> E-mail: npansar At us.ibm.com
>>> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>>>
>>> Matthias Boehm ---10/21/2016 01:00:51 PM---thanks Nakul for reaching 
out
>>> before starting work on this. Actually, the introduction of these CP-
>>>
>>> From: Matthias Boehm <mboe...@googlemail.com>
>>> To: dev@systemml.incubator.apache.org
>>> Date: 10/21/2016 01:00 PM
>>> Subject: Re: Local versions of Linear Algebra Operators in DML
>>>
>>>
>>>
>>> thanks Nakul for reaching out before starting work on this. Actually,
>>> the introduction of these CP-only builtin functions was a big mistake
>>> because (as you already mentioned) they mistakenly suggest that we
>>> provide distributed operations for them too. The intend was to support
>>> them in later versions with our own local and distributed
>>> implementations. So far, this had low priority though because these
>>> O(n^3) operations are seldom used over large data. However, a while
>>> back, we lost potential users who were specifically interested in
>>> distributed eigen - so there are still use cases.
>>>
>>> Despite the good intentions behind the renaming, I would strongly 
argue
>>> against it. First, it would unnecessarily lose compatibility with R
>>> syntax. Second, it would defeat our clean abstraction by exposing
>>> explicit local operations.
>>>
>>> This leaves us with two options here: (1) you could us

Re: Local versions of Linear Algebra Operators in DML

2016-10-24 Thread Matthias Boehm
well, we still compute memory estimates for these operations. So I 
guess, a good compromise would be to raise a warning whenever the memory 
estimate is known to exceed the local memory budget.


Regards,
Matthias

On 10/24/2016 8:29 PM, Deron Eriksson wrote:

Would it be acceptable for a user to receive a log warning if the user uses
an operation that is currently only implemented for single node? My concern
is that there is an expectation for operations to be distributed with
SystemML, and if an operation is not currently distributed, the user needs
to made aware of this.

Thoughts?

Deron


On Mon, Oct 24, 2016 at 10:38 AM, Nakul Jindal <naku...@gmail.com> wrote:


Hi,

There is an initial implementation and PR.
https://github.com/apache/incubator-systemml/pull/273

-Nakul



On Oct 24, 2016, at 12:59 AM, Berthold Reinwald <reinw...@us.ibm.com>

wrote:


Thanks, Imran. I think it is a good idea to start off with the DML-bodied
function implementation. This will hold until we can have a built in
implementation.

We prototyped an implementation of distributed Cholesky as a DML bodied
function as well. For performance optimization, as the matrix becomes
"small" enough, we switched over and exploit a single node

implementation.


Adding a new svd() built in function that initially routes to a local
library is fine. I don't know whether Apache commons math has an
implementation that can be re-used.

I object renaming the functions or changing the externals. Eventually
distributed instructions need to be added to these implementations, and
there are open jiras for it.

Regards,
Berthold Reinwald
IBM Almaden Research Center
office: (408) 927 2208; T/L: 457 2208
e-mail: reinw...@us.ibm.com



From:   Niketan Pansare/Almaden/IBM@IBMUS
To: dev@systemml.incubator.apache.org
Date:   10/21/2016 01:14 PM
Subject:Re: Local versions of Linear Algebra Operators in DML



I am also comfortable with option (2) ... "with a plan to implement its
distributed version"

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar

Matthias Boehm ---10/21/2016 01:00:51 PM---thanks Nakul for reaching out
before starting work on this. Actually, the introduction of these CP-

From: Matthias Boehm <mboe...@googlemail.com>
To: dev@systemml.incubator.apache.org
Date: 10/21/2016 01:00 PM
Subject: Re: Local versions of Linear Algebra Operators in DML



thanks Nakul for reaching out before starting work on this. Actually,
the introduction of these CP-only builtin functions was a big mistake
because (as you already mentioned) they mistakenly suggest that we
provide distributed operations for them too. The intend was to support
them in later versions with our own local and distributed
implementations. So far, this had low priority though because these
O(n^3) operations are seldom used over large data. However, a while
back, we lost potential users who were specifically interested in
distributed eigen - so there are still use cases.

Despite the good intentions behind the renaming, I would strongly argue
against it. First, it would unnecessarily lose compatibility with R
syntax. Second, it would defeat our clean abstraction by exposing
explicit local operations.

This leaves us with two options here: (1) you could use an external
(java-implemented) function, which gives you virtually the same runtime
behavior but a clear separation via an explicit registration, or (2) add
it to the list of CP-only operations (with a plan to implement its
distributed version) but name it 'svd' as in R.


Regards,
Matthias



On 10/21/2016 9:34 PM, Nakul Jindal wrote:
Hi,

Imran was planning on implementing a distributed SVD as a DML bodied
function.
The algorithm is described in the paper titled "A Distributed and
Incremental SVD Algorithm for Agglomerative Data Analysis on Large
Networks" available at https://arxiv.org/abs/1601.07010.

This algorithm requires the availability of a local SVD function, which

we

currently do not have in SystemML.
Seeing as how there are other linear algebra functions (eigen, lu, qr,
cholesky) in DML that reroute to Apache Common Math and only operate in
standalone/CP mode, would it be ok to add "svd" to this set?

Also, since these operations are local and not distributed and the
documentation doesn't make it clear that these operations wont operate

in

distributed mode, would it make sense to rename them to "local_eigen",
"local_qr", "local_cholesky", etc?
Obviously, this change would go into the version after 0.11.

I understand that the ideal solution to this problem is to have a
distributed version of the aforementioned linear algebra routines, but

for

the time being, would it be ok to go ahead do the rename, while also
introducing a "local_svd" ?


Niketan, Berthold, Matthias, Sasha - Any thoughts?

Thanks,
Nakul Jindal















Re: Local versions of Linear Algebra Operators in DML

2016-10-24 Thread Deron Eriksson
Would it be acceptable for a user to receive a log warning if the user uses
an operation that is currently only implemented for single node? My concern
is that there is an expectation for operations to be distributed with
SystemML, and if an operation is not currently distributed, the user needs
to made aware of this.

Thoughts?

Deron


On Mon, Oct 24, 2016 at 10:38 AM, Nakul Jindal <naku...@gmail.com> wrote:

> Hi,
>
> There is an initial implementation and PR.
> https://github.com/apache/incubator-systemml/pull/273
>
> -Nakul
>
>
> > On Oct 24, 2016, at 12:59 AM, Berthold Reinwald <reinw...@us.ibm.com>
> wrote:
> >
> > Thanks, Imran. I think it is a good idea to start off with the DML-bodied
> > function implementation. This will hold until we can have a built in
> > implementation.
> >
> > We prototyped an implementation of distributed Cholesky as a DML bodied
> > function as well. For performance optimization, as the matrix becomes
> > "small" enough, we switched over and exploit a single node
> implementation.
> >
> > Adding a new svd() built in function that initially routes to a local
> > library is fine. I don't know whether Apache commons math has an
> > implementation that can be re-used.
> >
> > I object renaming the functions or changing the externals. Eventually
> > distributed instructions need to be added to these implementations, and
> > there are open jiras for it.
> >
> > Regards,
> > Berthold Reinwald
> > IBM Almaden Research Center
> > office: (408) 927 2208; T/L: 457 2208
> > e-mail: reinw...@us.ibm.com
> >
> >
> >
> > From:   Niketan Pansare/Almaden/IBM@IBMUS
> > To: dev@systemml.incubator.apache.org
> > Date:   10/21/2016 01:14 PM
> > Subject:Re: Local versions of Linear Algebra Operators in DML
> >
> >
> >
> > I am also comfortable with option (2) ... "with a plan to implement its
> > distributed version"
> >
> > Thanks,
> >
> > Niketan Pansare
> > IBM Almaden Research Center
> > E-mail: npansar At us.ibm.com
> > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
> >
> > Matthias Boehm ---10/21/2016 01:00:51 PM---thanks Nakul for reaching out
> > before starting work on this. Actually, the introduction of these CP-
> >
> > From: Matthias Boehm <mboe...@googlemail.com>
> > To: dev@systemml.incubator.apache.org
> > Date: 10/21/2016 01:00 PM
> > Subject: Re: Local versions of Linear Algebra Operators in DML
> >
> >
> >
> > thanks Nakul for reaching out before starting work on this. Actually,
> > the introduction of these CP-only builtin functions was a big mistake
> > because (as you already mentioned) they mistakenly suggest that we
> > provide distributed operations for them too. The intend was to support
> > them in later versions with our own local and distributed
> > implementations. So far, this had low priority though because these
> > O(n^3) operations are seldom used over large data. However, a while
> > back, we lost potential users who were specifically interested in
> > distributed eigen - so there are still use cases.
> >
> > Despite the good intentions behind the renaming, I would strongly argue
> > against it. First, it would unnecessarily lose compatibility with R
> > syntax. Second, it would defeat our clean abstraction by exposing
> > explicit local operations.
> >
> > This leaves us with two options here: (1) you could use an external
> > (java-implemented) function, which gives you virtually the same runtime
> > behavior but a clear separation via an explicit registration, or (2) add
> > it to the list of CP-only operations (with a plan to implement its
> > distributed version) but name it 'svd' as in R.
> >
> >
> > Regards,
> > Matthias
> >
> >
> >> On 10/21/2016 9:34 PM, Nakul Jindal wrote:
> >> Hi,
> >>
> >> Imran was planning on implementing a distributed SVD as a DML bodied
> >> function.
> >> The algorithm is described in the paper titled "A Distributed and
> >> Incremental SVD Algorithm for Agglomerative Data Analysis on Large
> >> Networks" available at https://arxiv.org/abs/1601.07010.
> >>
> >> This algorithm requires the availability of a local SVD function, which
> > we
> >> currently do not have in SystemML.
> >> Seeing as how there are other linear algebra functions (eigen, lu, qr,
> >> cholesky) in DML that reroute to Apache Common Math and only operate

Re: Local versions of Linear Algebra Operators in DML

2016-10-24 Thread Nakul Jindal
Hi,

There is an initial implementation and PR. 
https://github.com/apache/incubator-systemml/pull/273

-Nakul


> On Oct 24, 2016, at 12:59 AM, Berthold Reinwald <reinw...@us.ibm.com> wrote:
> 
> Thanks, Imran. I think it is a good idea to start off with the DML-bodied 
> function implementation. This will hold until we can have a built in 
> implementation.
> 
> We prototyped an implementation of distributed Cholesky as a DML bodied 
> function as well. For performance optimization, as the matrix becomes 
> "small" enough, we switched over and exploit a single node implementation.
> 
> Adding a new svd() built in function that initially routes to a local 
> library is fine. I don't know whether Apache commons math has an 
> implementation that can be re-used. 
> 
> I object renaming the functions or changing the externals. Eventually 
> distributed instructions need to be added to these implementations, and 
> there are open jiras for it.
> 
> Regards,
> Berthold Reinwald
> IBM Almaden Research Center
> office: (408) 927 2208; T/L: 457 2208
> e-mail: reinw...@us.ibm.com
> 
> 
> 
> From:   Niketan Pansare/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date:   10/21/2016 01:14 PM
> Subject:Re: Local versions of Linear Algebra Operators in DML
> 
> 
> 
> I am also comfortable with option (2) ... "with a plan to implement its 
> distributed version"
> 
> Thanks,
> 
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
> 
> Matthias Boehm ---10/21/2016 01:00:51 PM---thanks Nakul for reaching out 
> before starting work on this. Actually, the introduction of these CP-
> 
> From: Matthias Boehm <mboe...@googlemail.com>
> To: dev@systemml.incubator.apache.org
> Date: 10/21/2016 01:00 PM
> Subject: Re: Local versions of Linear Algebra Operators in DML
> 
> 
> 
> thanks Nakul for reaching out before starting work on this. Actually, 
> the introduction of these CP-only builtin functions was a big mistake 
> because (as you already mentioned) they mistakenly suggest that we 
> provide distributed operations for them too. The intend was to support 
> them in later versions with our own local and distributed 
> implementations. So far, this had low priority though because these 
> O(n^3) operations are seldom used over large data. However, a while 
> back, we lost potential users who were specifically interested in 
> distributed eigen - so there are still use cases.
> 
> Despite the good intentions behind the renaming, I would strongly argue 
> against it. First, it would unnecessarily lose compatibility with R 
> syntax. Second, it would defeat our clean abstraction by exposing 
> explicit local operations.
> 
> This leaves us with two options here: (1) you could use an external 
> (java-implemented) function, which gives you virtually the same runtime 
> behavior but a clear separation via an explicit registration, or (2) add 
> it to the list of CP-only operations (with a plan to implement its 
> distributed version) but name it 'svd' as in R.
> 
> 
> Regards,
> Matthias
> 
> 
>> On 10/21/2016 9:34 PM, Nakul Jindal wrote:
>> Hi,
>> 
>> Imran was planning on implementing a distributed SVD as a DML bodied
>> function.
>> The algorithm is described in the paper titled "A Distributed and
>> Incremental SVD Algorithm for Agglomerative Data Analysis on Large
>> Networks" available at https://arxiv.org/abs/1601.07010.
>> 
>> This algorithm requires the availability of a local SVD function, which 
> we
>> currently do not have in SystemML.
>> Seeing as how there are other linear algebra functions (eigen, lu, qr,
>> cholesky) in DML that reroute to Apache Common Math and only operate in
>> standalone/CP mode, would it be ok to add "svd" to this set?
>> 
>> Also, since these operations are local and not distributed and the
>> documentation doesn't make it clear that these operations wont operate 
> in
>> distributed mode, would it make sense to rename them to "local_eigen",
>> "local_qr", "local_cholesky", etc?
>> Obviously, this change would go into the version after 0.11.
>> 
>> I understand that the ideal solution to this problem is to have a
>> distributed version of the aforementioned linear algebra routines, but 
> for
>> the time being, would it be ok to go ahead do the rename, while also
>> introducing a "local_svd" ?
>> 
>> 
>> Niketan, Berthold, Matthias, Sasha - Any thoughts?
>> 
>> Thanks,
>> Nakul Jindal
>> 
> 
> 
> 
> 
> 
> 
> 


Re: Local versions of Linear Algebra Operators in DML

2016-10-24 Thread Berthold Reinwald
Thanks, Imran. I think it is a good idea to start off with the DML-bodied 
function implementation. This will hold until we can have a built in 
implementation.

We prototyped an implementation of distributed Cholesky as a DML bodied 
function as well. For performance optimization, as the matrix becomes 
"small" enough, we switched over and exploit a single node implementation.

Adding a new svd() built in function that initially routes to a local 
library is fine. I don't know whether Apache commons math has an 
implementation that can be re-used. 

I object renaming the functions or changing the externals. Eventually 
distributed instructions need to be added to these implementations, and 
there are open jiras for it.

Regards,
Berthold Reinwald
IBM Almaden Research Center
office: (408) 927 2208; T/L: 457 2208
e-mail: reinw...@us.ibm.com



From:   Niketan Pansare/Almaden/IBM@IBMUS
To: dev@systemml.incubator.apache.org
Date:   10/21/2016 01:14 PM
Subject:    Re: Local versions of Linear Algebra Operators in DML



I am also comfortable with option (2) ... "with a plan to implement its 
distributed version"

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar

Matthias Boehm ---10/21/2016 01:00:51 PM---thanks Nakul for reaching out 
before starting work on this. Actually, the introduction of these CP-

From: Matthias Boehm <mboe...@googlemail.com>
To: dev@systemml.incubator.apache.org
Date: 10/21/2016 01:00 PM
Subject: Re: Local versions of Linear Algebra Operators in DML



thanks Nakul for reaching out before starting work on this. Actually, 
the introduction of these CP-only builtin functions was a big mistake 
because (as you already mentioned) they mistakenly suggest that we 
provide distributed operations for them too. The intend was to support 
them in later versions with our own local and distributed 
implementations. So far, this had low priority though because these 
O(n^3) operations are seldom used over large data. However, a while 
back, we lost potential users who were specifically interested in 
distributed eigen - so there are still use cases.

Despite the good intentions behind the renaming, I would strongly argue 
against it. First, it would unnecessarily lose compatibility with R 
syntax. Second, it would defeat our clean abstraction by exposing 
explicit local operations.

This leaves us with two options here: (1) you could use an external 
(java-implemented) function, which gives you virtually the same runtime 
behavior but a clear separation via an explicit registration, or (2) add 
it to the list of CP-only operations (with a plan to implement its 
distributed version) but name it 'svd' as in R.


Regards,
Matthias


On 10/21/2016 9:34 PM, Nakul Jindal wrote:
> Hi,
>
> Imran was planning on implementing a distributed SVD as a DML bodied
> function.
> The algorithm is described in the paper titled "A Distributed and
> Incremental SVD Algorithm for Agglomerative Data Analysis on Large
> Networks" available at https://arxiv.org/abs/1601.07010.
>
> This algorithm requires the availability of a local SVD function, which 
we
> currently do not have in SystemML.
> Seeing as how there are other linear algebra functions (eigen, lu, qr,
> cholesky) in DML that reroute to Apache Common Math and only operate in
> standalone/CP mode, would it be ok to add "svd" to this set?
>
> Also, since these operations are local and not distributed and the
> documentation doesn't make it clear that these operations wont operate 
in
> distributed mode, would it make sense to rename them to "local_eigen",
> "local_qr", "local_cholesky", etc?
> Obviously, this change would go into the version after 0.11.
>
> I understand that the ideal solution to this problem is to have a
> distributed version of the aforementioned linear algebra routines, but 
for
> the time being, would it be ok to go ahead do the rename, while also
> introducing a "local_svd" ?
>
>
> Niketan, Berthold, Matthias, Sasha - Any thoughts?
>
> Thanks,
> Nakul Jindal
>









Re: Local versions of Linear Algebra Operators in DML

2016-10-21 Thread Niketan Pansare

I am also comfortable with option (2) ... "with a plan to implement its
distributed version"

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar



From:   Matthias Boehm <mboe...@googlemail.com>
To: dev@systemml.incubator.apache.org
Date:   10/21/2016 01:00 PM
Subject:    Re: Local versions of Linear Algebra Operators in DML



thanks Nakul for reaching out before starting work on this. Actually,
the introduction of these CP-only builtin functions was a big mistake
because (as you already mentioned) they mistakenly suggest that we
provide distributed operations for them too. The intend was to support
them in later versions with our own local and distributed
implementations. So far, this had low priority though because these
O(n^3) operations are seldom used over large data. However, a while
back, we lost potential users who were specifically interested in
distributed eigen - so there are still use cases.

Despite the good intentions behind the renaming, I would strongly argue
against it. First, it would unnecessarily lose compatibility with R
syntax. Second, it would defeat our clean abstraction by exposing
explicit local operations.

This leaves us with two options here: (1) you could use an external
(java-implemented) function, which gives you virtually the same runtime
behavior but a clear separation via an explicit registration, or (2) add
it to the list of CP-only operations (with a plan to implement its
distributed version) but name it 'svd' as in R.


Regards,
Matthias


On 10/21/2016 9:34 PM, Nakul Jindal wrote:
> Hi,
>
> Imran was planning on implementing a distributed SVD as a DML bodied
> function.
> The algorithm is described in the paper titled "A Distributed and
> Incremental SVD Algorithm for Agglomerative Data Analysis on Large
> Networks" available at https://arxiv.org/abs/1601.07010.
>
> This algorithm requires the availability of a local SVD function, which
we
> currently do not have in SystemML.
> Seeing as how there are other linear algebra functions (eigen, lu, qr,
> cholesky) in DML that reroute to Apache Common Math and only operate in
> standalone/CP mode, would it be ok to add "svd" to this set?
>
> Also, since these operations are local and not distributed and the
> documentation doesn't make it clear that these operations wont operate in
> distributed mode, would it make sense to rename them to "local_eigen",
> "local_qr", "local_cholesky", etc?
> Obviously, this change would go into the version after 0.11.
>
> I understand that the ideal solution to this problem is to have a
> distributed version of the aforementioned linear algebra routines, but
for
> the time being, would it be ok to go ahead do the rename, while also
> introducing a "local_svd" ?
>
>
> Niketan, Berthold, Matthias, Sasha - Any thoughts?
>
> Thanks,
> Nakul Jindal
>





Re: Local versions of Linear Algebra Operators in DML

2016-10-21 Thread Niketan Pansare

Hi Nakul,

I really don't like the fact that eigen, lu, qr, cholesky only have local
implementation and we have qualified them to the status of builtin
functions. We should definitely consider the option of implementing a SPARK
instructions for them (as you mentioned in the email) before we officially
mark them to "local_only". In fact, instead of marking them as
"local_only", I would much rather prefer to support them as external
builtin functions.

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar



From:   Deron Eriksson <deroneriks...@gmail.com>
To: dev@systemml.incubator.apache.org
Date:   10/21/2016 12:52 PM
Subject:Re: Local versions of Linear Algebra Operators in DML



Hi Nakul,

+1
I think having some clear characteristic to distinguish operations that
only operate locally is a great idea. Otherwise, how would a user know that
these operations are only local and not distributed? Adding this naming
convention for local operations sounds reasonable to me so that we don't
anger users who expect an operation to be distributed when in actuality it
only currently runs locally.

Deron



On Fri, Oct 21, 2016 at 12:34 PM, Nakul Jindal <naku...@gmail.com> wrote:

> Hi,
>
> Imran was planning on implementing a distributed SVD as a DML bodied
> function.
> The algorithm is described in the paper titled "A Distributed and
> Incremental SVD Algorithm for Agglomerative Data Analysis on Large
> Networks" available at https://arxiv.org/abs/1601.07010.
>
> This algorithm requires the availability of a local SVD function, which
we
> currently do not have in SystemML.
> Seeing as how there are other linear algebra functions (eigen, lu, qr,
> cholesky) in DML that reroute to Apache Common Math and only operate in
> standalone/CP mode, would it be ok to add "svd" to this set?
>
> Also, since these operations are local and not distributed and the
> documentation doesn't make it clear that these operations wont operate in
> distributed mode, would it make sense to rename them to "local_eigen",
> "local_qr", "local_cholesky", etc?
> Obviously, this change would go into the version after 0.11.
>
> I understand that the ideal solution to this problem is to have a
> distributed version of the aforementioned linear algebra routines, but
for
> the time being, would it be ok to go ahead do the rename, while also
> introducing a "local_svd" ?
>
>
> Niketan, Berthold, Matthias, Sasha - Any thoughts?
>
> Thanks,
> Nakul Jindal
>




Re: Local versions of Linear Algebra Operators in DML

2016-10-21 Thread Matthias Boehm
thanks Nakul for reaching out before starting work on this. Actually, 
the introduction of these CP-only builtin functions was a big mistake 
because (as you already mentioned) they mistakenly suggest that we 
provide distributed operations for them too. The intend was to support 
them in later versions with our own local and distributed 
implementations. So far, this had low priority though because these 
O(n^3) operations are seldom used over large data. However, a while 
back, we lost potential users who were specifically interested in 
distributed eigen - so there are still use cases.


Despite the good intentions behind the renaming, I would strongly argue 
against it. First, it would unnecessarily lose compatibility with R 
syntax. Second, it would defeat our clean abstraction by exposing 
explicit local operations.


This leaves us with two options here: (1) you could use an external 
(java-implemented) function, which gives you virtually the same runtime 
behavior but a clear separation via an explicit registration, or (2) add 
it to the list of CP-only operations (with a plan to implement its 
distributed version) but name it 'svd' as in R.



Regards,
Matthias


On 10/21/2016 9:34 PM, Nakul Jindal wrote:

Hi,

Imran was planning on implementing a distributed SVD as a DML bodied
function.
The algorithm is described in the paper titled "A Distributed and
Incremental SVD Algorithm for Agglomerative Data Analysis on Large
Networks" available at https://arxiv.org/abs/1601.07010.

This algorithm requires the availability of a local SVD function, which we
currently do not have in SystemML.
Seeing as how there are other linear algebra functions (eigen, lu, qr,
cholesky) in DML that reroute to Apache Common Math and only operate in
standalone/CP mode, would it be ok to add "svd" to this set?

Also, since these operations are local and not distributed and the
documentation doesn't make it clear that these operations wont operate in
distributed mode, would it make sense to rename them to "local_eigen",
"local_qr", "local_cholesky", etc?
Obviously, this change would go into the version after 0.11.

I understand that the ideal solution to this problem is to have a
distributed version of the aforementioned linear algebra routines, but for
the time being, would it be ok to go ahead do the rename, while also
introducing a "local_svd" ?


Niketan, Berthold, Matthias, Sasha - Any thoughts?

Thanks,
Nakul Jindal



Re: Local versions of Linear Algebra Operators in DML

2016-10-21 Thread Deron Eriksson
Hi Nakul,

+1
I think having some clear characteristic to distinguish operations that
only operate locally is a great idea. Otherwise, how would a user know that
these operations are only local and not distributed? Adding this naming
convention for local operations sounds reasonable to me so that we don't
anger users who expect an operation to be distributed when in actuality it
only currently runs locally.

Deron



On Fri, Oct 21, 2016 at 12:34 PM, Nakul Jindal  wrote:

> Hi,
>
> Imran was planning on implementing a distributed SVD as a DML bodied
> function.
> The algorithm is described in the paper titled "A Distributed and
> Incremental SVD Algorithm for Agglomerative Data Analysis on Large
> Networks" available at https://arxiv.org/abs/1601.07010.
>
> This algorithm requires the availability of a local SVD function, which we
> currently do not have in SystemML.
> Seeing as how there are other linear algebra functions (eigen, lu, qr,
> cholesky) in DML that reroute to Apache Common Math and only operate in
> standalone/CP mode, would it be ok to add "svd" to this set?
>
> Also, since these operations are local and not distributed and the
> documentation doesn't make it clear that these operations wont operate in
> distributed mode, would it make sense to rename them to "local_eigen",
> "local_qr", "local_cholesky", etc?
> Obviously, this change would go into the version after 0.11.
>
> I understand that the ideal solution to this problem is to have a
> distributed version of the aforementioned linear algebra routines, but for
> the time being, would it be ok to go ahead do the rename, while also
> introducing a "local_svd" ?
>
>
> Niketan, Berthold, Matthias, Sasha - Any thoughts?
>
> Thanks,
> Nakul Jindal
>


Local versions of Linear Algebra Operators in DML

2016-10-21 Thread Nakul Jindal
Hi,

Imran was planning on implementing a distributed SVD as a DML bodied
function.
The algorithm is described in the paper titled "A Distributed and
Incremental SVD Algorithm for Agglomerative Data Analysis on Large
Networks" available at https://arxiv.org/abs/1601.07010.

This algorithm requires the availability of a local SVD function, which we
currently do not have in SystemML.
Seeing as how there are other linear algebra functions (eigen, lu, qr,
cholesky) in DML that reroute to Apache Common Math and only operate in
standalone/CP mode, would it be ok to add "svd" to this set?

Also, since these operations are local and not distributed and the
documentation doesn't make it clear that these operations wont operate in
distributed mode, would it make sense to rename them to "local_eigen",
"local_qr", "local_cholesky", etc?
Obviously, this change would go into the version after 0.11.

I understand that the ideal solution to this problem is to have a
distributed version of the aforementioned linear algebra routines, but for
the time being, would it be ok to go ahead do the rename, while also
introducing a "local_svd" ?


Niketan, Berthold, Matthias, Sasha - Any thoughts?

Thanks,
Nakul Jindal