[jira] [Commented] (SPARK-15194) Add Python ML API for MultivariateGaussian

2016-07-26 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15395018#comment-15395018
 ] 

Apache Spark commented on SPARK-15194:
--

User 'praveendareddy21' has created a pull request for this issue:
https://github.com/apache/spark/pull/14375

> Add Python ML API for MultivariateGaussian
> --
>
> Key: SPARK-15194
> URL: https://issues.apache.org/jira/browse/SPARK-15194
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: holdenk
>Priority: Minor
>
> We have a PySpark API for the MLLib version but not the ML version. This 
> would allow Python's  `GaussianMixture` to more closely match the Scala API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15194) Add Python ML API for MultivariateGaussian

2016-05-21 Thread praveen dareddy (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295413#comment-15295413
 ] 

praveen dareddy commented on SPARK-15194:
-

[~josephkb][~holdenk]
I have sent PR  to resolve this issue.
Kindly, review PR.

Thanks,
praveen

> Add Python ML API for MultivariateGaussian
> --
>
> Key: SPARK-15194
> URL: https://issues.apache.org/jira/browse/SPARK-15194
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: holdenk
>Priority: Minor
>
> We have a PySpark API for the MLLib version but not the ML version. This 
> would allow Python's  `GaussianMixture` to more closely match the Scala API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15194) Add Python ML API for MultivariateGaussian

2016-05-21 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295410#comment-15295410
 ] 

Apache Spark commented on SPARK-15194:
--

User 'praveendareddy21' has created a pull request for this issue:
https://github.com/apache/spark/pull/13248

> Add Python ML API for MultivariateGaussian
> --
>
> Key: SPARK-15194
> URL: https://issues.apache.org/jira/browse/SPARK-15194
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: holdenk
>Priority: Minor
>
> We have a PySpark API for the MLLib version but not the ML version. This 
> would allow Python's  `GaussianMixture` to more closely match the Scala API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15194) Add Python ML API for MultivariateGaussian

2016-05-18 Thread Gayathri Murali (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290007#comment-15290007
 ] 

Gayathri Murali commented on SPARK-15194:
-

[~holdenk] I see that mllib/stat/distribution.py has the Python class for mllib 
version of MultiVariateGaussian. Are you looking to creating a similar 
stat/distribution.py at pyspark/ml as well for the mllib-local version of 
MultiVariateGaussian? 

> Add Python ML API for MultivariateGaussian
> --
>
> Key: SPARK-15194
> URL: https://issues.apache.org/jira/browse/SPARK-15194
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: holdenk
>Priority: Minor
>
> We have a PySpark API for the MLLib version but not the ML version. This 
> would allow Python's  `GaussianMixture` to more closely match the Scala API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15194) Add Python ML API for MultivariateGaussian

2016-05-16 Thread praveen dareddy (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285618#comment-15285618
 ] 

praveen dareddy commented on SPARK-15194:
-

[~josephkb] Thanks for clarifying this.
I will continue work on this issue once the blocker issue SPARK-14906 is merged 
to the master.

Thanks,
praveen

> Add Python ML API for MultivariateGaussian
> --
>
> Key: SPARK-15194
> URL: https://issues.apache.org/jira/browse/SPARK-15194
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: holdenk
>Priority: Minor
>
> We have a PySpark API for the MLLib version but not the ML version. This 
> would allow Python's  `GaussianMixture` to more closely match the Scala API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15194) Add Python ML API for MultivariateGaussian

2016-05-16 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285595#comment-15285595
 ] 

Joseph K. Bradley commented on SPARK-15194:
---

This should be implemented using numpy, within mllib-local, as [~holdenk] said. 
 But you'll need to wait until the blocker JIRA is done.

> Add Python ML API for MultivariateGaussian
> --
>
> Key: SPARK-15194
> URL: https://issues.apache.org/jira/browse/SPARK-15194
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: holdenk
>Priority: Minor
>
> We have a PySpark API for the MLLib version but not the ML version. This 
> would allow Python's  `GaussianMixture` to more closely match the Scala API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15194) Add Python ML API for MultivariateGaussian

2016-05-11 Thread praveen dareddy (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280563#comment-15280563
 ] 

praveen dareddy commented on SPARK-15194:
-

Hi All,
After going through ml and mllib api's,It seems MultivariateGaussian in scala 
uses breeze library for linear algebra.
So, are we implementing the same in Python using numpy or using  a wrapper to 
Scala MultivariateGaussian?

I have tried using JavaWrapper class in 
https://github.com/apache/spark/blob/master/python/pyspark/ml/wrapper.py
as wrapper solution. But I am getting constructor errors.( need to pass Vector 
and DenseMatrix to MultivariateGaussian).

Are there any other Wrapper API's i am missing?
Kindly, help me out.

Thanks,
Praveen

Here is my code,

from pyspark.ml.wrapper import JavaWrapper
__all__ = ['MultivariateGaussian']

class MultivariateGaussian(JavaWrapper):

#@keyword_only
def __init__(self, mu,sigma):
super(MultivariateGaussian, self).__init__()
self._java_obj = self._new_java_obj(

"org.apache.spark.ml.stat.distribution.MultivariateGaussian",(mu,sigma) )
self.mu=mu
self.sigma=sigma


> Add Python ML API for MultivariateGaussian
> --
>
> Key: SPARK-15194
> URL: https://issues.apache.org/jira/browse/SPARK-15194
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: holdenk
>Priority: Minor
>
> We have a PySpark API for the MLLib version but not the ML version. This 
> would allow Python's  `GaussianMixture` to more closely match the Scala API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15194) Add Python ML API for MultivariateGaussian

2016-05-06 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275077#comment-15275077
 ] 

holdenk commented on SPARK-15194:
-

So this is the ml api not the mllib api, ml's `MultivariateGaussian` moved into 
mllib-local ( 
https://github.com/apache/spark/blob/master/mllib-local/src/main/scala/org/apache/spark/ml/stat/distribution/MultivariateGaussian.scala
 ).

> Add Python ML API for MultivariateGaussian
> --
>
> Key: SPARK-15194
> URL: https://issues.apache.org/jira/browse/SPARK-15194
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: holdenk
>Priority: Minor
>
> We have a PySpark API for the MLLib version but not the ML version. This 
> would allow Python's  `GaussianMixture` to more closely match the Scala API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15194) Add Python ML API for MultivariateGaussian

2016-05-06 Thread praveen dareddy (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275076#comment-15275076
 ] 

praveen dareddy commented on SPARK-15194:
-


Hi,

It seems PySpark version of GauusianMixture is currently implemented in 
clustering.py as GaussianMixtureModel class.
https://github.com/apache/spark/blob/302a18686998b8b96546526bfccec9cf5b667386/python/pyspark/mllib/clustering.py

Can anyone point me in the right direction here.

Thanks,
Praveen

> Add Python ML API for MultivariateGaussian
> --
>
> Key: SPARK-15194
> URL: https://issues.apache.org/jira/browse/SPARK-15194
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: holdenk
>Priority: Minor
>
> We have a PySpark API for the MLLib version but not the ML version. This 
> would allow Python's  `GaussianMixture` to more closely match the Scala API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15194) Add Python ML API for MultivariateGaussian

2016-05-06 Thread praveen dareddy (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275074#comment-15275074
 ] 

praveen dareddy commented on SPARK-15194:
-

Can i contribute to this issue?
>From what i understood till now, we need to mirror 
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala
in pySpark API.

Am i understanding it right?

Thanks,
Praveen



> Add Python ML API for MultivariateGaussian
> --
>
> Key: SPARK-15194
> URL: https://issues.apache.org/jira/browse/SPARK-15194
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: holdenk
>Priority: Minor
>
> We have a PySpark API for the MLLib version but not the ML version. This 
> would allow Python's  `GaussianMixture` to more closely match the Scala API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org