[jira] [Commented] (SPARK-4038) Outlier Detection Algorithm for MLlib

2015-06-22 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596794#comment-14596794
 ] 

Anant Daksh Asthana commented on SPARK-4038:


I do agree a general wrapper might be quite involved. It may be wise to
create a toolkit of algorithms and just document them well. Follow the
patterns to make them all compatible with mlpipe.
What do you think of that?

On Mon, Jun 22, 2015, 4:14 PM Joseph K. Bradley (JIRA) j...@apache.org



 Outlier Detection Algorithm for MLlib
 -

 Key: SPARK-4038
 URL: https://issues.apache.org/jira/browse/SPARK-4038
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Ashutosh Trivedi
Priority: Minor

 The aim of this JIRA is to discuss about which parallel outlier detection 
 algorithms can be included in MLlib. 
 The one which I am familiar with is Attribute Value Frequency (AVF). It 
 scales linearly with the number of data points and attributes, and relies on 
 a single data scan. It is not distance based and well suited for categorical 
 data. In original paper  a parallel version is also given, which is not 
 complected to implement.  I am working on the implementation and soon submit 
 the initial code for review.
 Here is the Link for the paper
 http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4410382
 As pointed out by Xiangrui in discussion 
 http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Contributing-Algorithm-for-Outlier-Detection-td8880.html
 There are other algorithms also. Lets discuss about which will be more 
 general and easily paralleled.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4038) Outlier Detection Algorithm for MLlib

2015-06-22 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596830#comment-14596830
 ] 

Anant Daksh Asthana commented on SPARK-4038:


So what are some good first algorithms in your opinion? Avf, kmeans,
k-nearest neighbor based algorithms or maybe LOF?
I think avf and kmeans might be a good starting point.

On Mon, Jun 22, 2015, 5:09 PM Joseph K. Bradley (JIRA) j...@apache.org



 Outlier Detection Algorithm for MLlib
 -

 Key: SPARK-4038
 URL: https://issues.apache.org/jira/browse/SPARK-4038
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Ashutosh Trivedi
Priority: Minor

 The aim of this JIRA is to discuss about which parallel outlier detection 
 algorithms can be included in MLlib. 
 The one which I am familiar with is Attribute Value Frequency (AVF). It 
 scales linearly with the number of data points and attributes, and relies on 
 a single data scan. It is not distance based and well suited for categorical 
 data. In original paper  a parallel version is also given, which is not 
 complected to implement.  I am working on the implementation and soon submit 
 the initial code for review.
 Here is the Link for the paper
 http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4410382
 As pointed out by Xiangrui in discussion 
 http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Contributing-Algorithm-for-Outlier-Detection-td8880.html
 There are other algorithms also. Lets discuss about which will be more 
 general and easily paralleled.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4649) Add method unionAll to PySpark's SchemaRDD

2014-11-28 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228640#comment-14228640
 ] 

Anant Daksh Asthana commented on SPARK-4649:


I would like to take eon this task.

 Add method unionAll to PySpark's SchemaRDD 
 ---

 Key: SPARK-4649
 URL: https://issues.apache.org/jira/browse/SPARK-4649
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Affects Versions: 1.1.0
Reporter: Luca Foschini
Priority: Minor

 PySpark has no equivalent of Scala's SchemaRDD.unionAll.
 The standard SchemaRDD.union method downcasts the result to UnionRDD which 
 makes it not amenable for chaining.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4038) Outlier Detection Algorithm for MLlib

2014-11-20 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220232#comment-14220232
 ] 

Anant Daksh Asthana commented on SPARK-4038:


So AVF is based on k-modes for detecting outliers which is similar in
spirit to k-means. We could add the k-modes algorithm and have the avf
outlier detection as an add on or extension to it. We could do a similar
thing detecting outliers with k-means etc too



 Outlier Detection Algorithm for MLlib
 -

 Key: SPARK-4038
 URL: https://issues.apache.org/jira/browse/SPARK-4038
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Ashutosh Trivedi
Priority: Minor

 The aim of this JIRA is to discuss about which parallel outlier detection 
 algorithms can be included in MLlib. 
 The one which I am familiar with is Attribute Value Frequency (AVF). It 
 scales linearly with the number of data points and attributes, and relies on 
 a single data scan. It is not distance based and well suited for categorical 
 data. In original paper  a parallel version is also given, which is not 
 complected to implement.  I am working on the implementation and soon submit 
 the initial code for review.
 Here is the Link for the paper
 http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4410382
 As pointed out by Xiangrui in discussion 
 http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-Contributing-Algorithm-for-Outlier-Detection-td8880.html
 There are other algorithms also. Lets discuss about which will be more 
 general and easily paralleled.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4127) Streaming Linear Regression- Python bindings

2014-11-01 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192996#comment-14192996
 ] 

Anant Daksh Asthana commented on SPARK-4127:


[~mengxr][~freeman-lab] I am running into some issues. Wondering if you could 
help.
I have pushed some changes to my branch 
https://github.com/anantasty/spark/tree/SPARK-4127
I added functions to the PythonMLLibAPI.scala
and to the python/pyspark/mllib/regression.py

I added an example similar to the scala one.

When i run it I get an error java.lang.ClassCastException: [B cannot be cast 
to org.apache.spark.mllib.linalg.Vector
 Which I am not sure how to work with.
There are plenty examples where Python SparseVectors and DenseVectors are 
passed over in RDD's and work just fine. Also the training data is sent as a 
pair of Double, Vector and works fine.
But on the test_data (model.predictOn) it throws the exception.


 Streaming Linear Regression- Python bindings
 

 Key: SPARK-4127
 URL: https://issues.apache.org/jira/browse/SPARK-4127
 Project: Spark
  Issue Type: Improvement
  Components: MLlib, PySpark
Reporter: Anant Daksh Asthana
Priority: Minor

 Create python bindings for Streaming Linear Regression (MLlib).
 The Mllib file relevant to this issue can be found at : 
 https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingLinearRegression.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4127) Streaming Linear Regression- Python bindings

2014-10-31 Thread Anant Daksh Asthana (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anant Daksh Asthana updated SPARK-4127:
---
Summary: Streaming Linear Regression- Python bindings  (was: Streaming 
Linear Regression)

 Streaming Linear Regression- Python bindings
 

 Key: SPARK-4127
 URL: https://issues.apache.org/jira/browse/SPARK-4127
 Project: Spark
  Issue Type: Improvement
  Components: MLlib, PySpark
Reporter: Anant Daksh Asthana
Priority: Minor

 Create python bindings for Streaming Linear Regression (MLlib).
 The Mllib file relevant to this issue can be found at : 
 https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingLinearRegression.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4108) Fix uses of @deprecated in catalyst dataTypes

2014-10-28 Thread Anant Daksh Asthana (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anant Daksh Asthana updated SPARK-4108:
---
Component/s: SQL

 Fix uses of @deprecated in catalyst dataTypes
 -

 Key: SPARK-4108
 URL: https://issues.apache.org/jira/browse/SPARK-4108
 Project: Spark
  Issue Type: Task
  Components: SQL
Reporter: Anant Daksh Asthana
Priority: Trivial

 @deprecated takes 2 parameters message and version 
 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala
 has a usage of @deprecated with just one parameter



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4118) Create python bindings for Streaming KMeans

2014-10-28 Thread Anant Daksh Asthana (JIRA)
Anant Daksh Asthana created SPARK-4118:
--

 Summary: Create python bindings for Streaming KMeans
 Key: SPARK-4118
 URL: https://issues.apache.org/jira/browse/SPARK-4118
 Project: Spark
  Issue Type: Improvement
  Components: MLlib, PySpark
Reporter: Anant Daksh Asthana
Priority: Minor


Create Python bindings for Streaming K-means
This is in reference to https://issues.apache.org/jira/browse/SPARK-3254
which adds Streaming K-means functionality to MLLib.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4127) Streaming Linear Regression

2014-10-28 Thread Anant Daksh Asthana (JIRA)
Anant Daksh Asthana created SPARK-4127:
--

 Summary: Streaming Linear Regression
 Key: SPARK-4127
 URL: https://issues.apache.org/jira/browse/SPARK-4127
 Project: Spark
  Issue Type: Improvement
  Components: MLlib, PySpark
Reporter: Anant Daksh Asthana
Priority: Minor


Create python bindings for Streaming Linear Regression (MLlib).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4127) Streaming Linear Regression

2014-10-28 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187906#comment-14187906
 ] 

Anant Daksh Asthana commented on SPARK-4127:


[~mengxr] [~freeman-lab] Just added this issue. Could you please assign it to 
me.
Thanks 

 Streaming Linear Regression
 ---

 Key: SPARK-4127
 URL: https://issues.apache.org/jira/browse/SPARK-4127
 Project: Spark
  Issue Type: Improvement
  Components: MLlib, PySpark
Reporter: Anant Daksh Asthana
Priority: Minor

 Create python bindings for Streaming Linear Regression (MLlib).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4127) Streaming Linear Regression

2014-10-28 Thread Anant Daksh Asthana (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anant Daksh Asthana updated SPARK-4127:
---
Description: 
Create python bindings for Streaming Linear Regression (MLlib).
The Mllib file relevant to this issue can be found 
(here)[https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingLinearRegression.scala]

  was:Create python bindings for Streaming Linear Regression (MLlib).


 Streaming Linear Regression
 ---

 Key: SPARK-4127
 URL: https://issues.apache.org/jira/browse/SPARK-4127
 Project: Spark
  Issue Type: Improvement
  Components: MLlib, PySpark
Reporter: Anant Daksh Asthana
Priority: Minor

 Create python bindings for Streaming Linear Regression (MLlib).
 The Mllib file relevant to this issue can be found 
 (here)[https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingLinearRegression.scala]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4127) Streaming Linear Regression

2014-10-28 Thread Anant Daksh Asthana (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anant Daksh Asthana updated SPARK-4127:
---
Description: 
Create python bindings for Streaming Linear Regression (MLlib).
The Mllib file relevant to this issue can be found at : 
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingLinearRegression.scala

  was:
Create python bindings for Streaming Linear Regression (MLlib).
The Mllib file relevant to this issue can be found 
(here)[https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingLinearRegression.scala]


 Streaming Linear Regression
 ---

 Key: SPARK-4127
 URL: https://issues.apache.org/jira/browse/SPARK-4127
 Project: Spark
  Issue Type: Improvement
  Components: MLlib, PySpark
Reporter: Anant Daksh Asthana
Priority: Minor

 Create python bindings for Streaming Linear Regression (MLlib).
 The Mllib file relevant to this issue can be found at : 
 https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingLinearRegression.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4108) Fix uses od @deprecated in catalyst dataTypes

2014-10-27 Thread Anant Daksh Asthana (JIRA)
Anant Daksh Asthana created SPARK-4108:
--

 Summary: Fix uses od @deprecated in catalyst dataTypes
 Key: SPARK-4108
 URL: https://issues.apache.org/jira/browse/SPARK-4108
 Project: Spark
  Issue Type: Task
Reporter: Anant Daksh Asthana
Priority: Trivial


@deprecated takes 2 parameters message and version 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala
has a usage of @deprecated with just one parameter



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4108) Fix uses of @deprecated in catalyst dataTypes

2014-10-27 Thread Anant Daksh Asthana (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anant Daksh Asthana updated SPARK-4108:
---
Summary: Fix uses of @deprecated in catalyst dataTypes  (was: Fix uses od 
@deprecated in catalyst dataTypes)

 Fix uses of @deprecated in catalyst dataTypes
 -

 Key: SPARK-4108
 URL: https://issues.apache.org/jira/browse/SPARK-4108
 Project: Spark
  Issue Type: Task
Reporter: Anant Daksh Asthana
Priority: Trivial

 @deprecated takes 2 parameters message and version 
 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala
 has a usage of @deprecated with just one parameter



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2335) k-Nearest Neighbor classification and regression for MLLib

2014-10-27 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186347#comment-14186347
 ] 

Anant Daksh Asthana commented on SPARK-2335:


[~Rusty][~bgawalt] I would be willing to help in this implementation as well.
Thanks

 k-Nearest Neighbor classification and regression for MLLib
 --

 Key: SPARK-2335
 URL: https://issues.apache.org/jira/browse/SPARK-2335
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Brian Gawalt
Priority: Minor
  Labels: features, newbie

 The k-Nearest Neighbor model for classification and regression problems is a 
 simple and intuitive approach, offering a straightforward path to creating 
 non-linear decision/estimation contours. It's downsides -- high variance 
 (sensitivity to the known training data set) and computational intensity for 
 estimating new point labels -- both play to Spark's big data strengths: lots 
 of data mitigates data concerns; lots of workers mitigate computational 
 latency. 
 We should include kNN models as options in MLLib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3838) Python code example for Word2Vec in user guide

2014-10-26 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184750#comment-14184750
 ] 

Anant Daksh Asthana commented on SPARK-3838:


Pull request for resolution can be found at 
https://github.com/apache/spark/pull/2952

 Python code example for Word2Vec in user guide
 --

 Key: SPARK-3838
 URL: https://issues.apache.org/jira/browse/SPARK-3838
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib
Reporter: Xiangrui Meng
Assignee: Anant Daksh Asthana
Priority: Trivial





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2396) Spark EC2 scripts fail when trying to log in to EC2 instances

2014-10-26 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184803#comment-14184803
 ] 

Anant Daksh Asthana commented on SPARK-2396:


Seems like a python issue on your system. You are missing the subprocess module.

 Spark EC2 scripts fail when trying to log in to EC2 instances
 -

 Key: SPARK-2396
 URL: https://issues.apache.org/jira/browse/SPARK-2396
 Project: Spark
  Issue Type: Bug
  Components: EC2
Affects Versions: 1.0.0
 Environment: Windows 8, Cygwin and command prompt, Python 2.7
Reporter: Stephen M. Hopper
  Labels: aws, ec2, ssh

 I cannot seem to successfully start up a Spark EC2 cluster using the 
 spark-ec2 script.
 I'm using variations on the following command:
 ./spark-ec2 --instance-type=m1.small --region=us-west-1 --spot-price=0.05 
 --spark-version=1.0.0 -k my-key-name -i my-key-name.pem -s 1 launch 
 spark-test-cluster
 The script always allocates the EC2 instances without much trouble, but can 
 never seem to complete the SSH step to install Spark on the cluster.  It 
 always complains about my SSH key.  If I try to log in with my ssh key doing 
 something like this:
 ssh -i my-key-name.pem root@insert ip of my instance here
 it fails.  However, if I log in to the AWS console, click on my instance and 
 select connect, it displays the instructions for SSHing into my instance 
 (which are no different from the ssh command from above).  So, if I rerun the 
 SSH command from above, I'm able to log in.
 Next, if I try to rerun the spark-ec2 command from above (replacing launch 
 with start), the script logs in and starts installing Spark.  However, it 
 eventually errors out with the following output:
 Cloning into 'spark-ec2'...
 remote: Counting objects: 1465, done.
 remote: Compressing objects: 100% (697/697), done.
 remote: Total 1465 (delta 485), reused 1465 (delta 485)
 Receiving objects: 100% (1465/1465), 228.51 KiB | 287 KiB/s, done.
 Resolving deltas: 100% (485/485), done.
 Connection to ec2-my-clusters-ip.us-west-1.compute.amazonaws.com closed.
 Searching for existing cluster spark-test-cluster...
 Found 1 master(s), 1 slaves
 Starting slaves...
 Starting master...
 Waiting for instances to start up...
 Waiting 120 more seconds...
 Deploying files to master...
 Traceback (most recent call last):
   File ./spark_ec2.py, line 823, in module
 main()
   File ./spark_ec2.py, line 815, in main
 real_main()
   File ./spark_ec2.py, line 806, in real_main
 setup_cluster(conn, master_nodes, slave_nodes, opts, False)
   File ./spark_ec2.py, line 450, in setup_cluster
 deploy_files(conn, deploy.generic, opts, master_nodes, slave_nodes, 
 modules)
   File ./spark_ec2.py, line 593, in deploy_files
 subprocess.check_call(command)
   File E:\windows_programs\Python27\lib\subprocess.py, line 535, in 
 check_call
 retcode = call(*popenargs, **kwargs)
   File E:\windows_programs\Python27\lib\subprocess.py, line 522, in call
 return Popen(*popenargs, **kwargs).wait()
   File E:\windows_programs\Python27\lib\subprocess.py, line 710, in __init__
 errread, errwrite)
   File E:\windows_programs\Python27\lib\subprocess.py, line 958, in 
 _execute_child
 startupinfo)
 WindowsError: [Error 2] The system cannot find the file specified
 So, in short, am I missing something or is this a bug?  Any help would be 
 appreciated.
 Other notes:
 -I've tried both us-west-1 and us-east-1 regions.
 -I've tried several different instance types.
 -I've tried playing with the permissions on the ssh key (600, 400, etc.), but 
 to no avail



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3838) Python code example for Word2Vec in user guide

2014-10-13 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168989#comment-14168989
 ] 

Anant Daksh Asthana edited comment on SPARK-3838 at 10/13/14 6:22 AM:
--

Thanks [~mengxr] I will follow the instructions. I did also mention the coding 
guides are centered around Java/ Scala.


was (Author: slcclimber):
Thanks [~mengxr] I will follow the instructions. I did also mention the coding 
guides are centered around Java/ Scala. It would be nice to create one for 
Pyspark which colsely follows PEP-8.

 Python code example for Word2Vec in user guide
 --

 Key: SPARK-3838
 URL: https://issues.apache.org/jira/browse/SPARK-3838
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib
Reporter: Xiangrui Meng
Assignee: Anant Daksh Asthana
Priority: Trivial





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3838) Python code example for Word2Vec in user guide

2014-10-11 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168492#comment-14168492
 ] 

Anant Daksh Asthana commented on SPARK-3838:


I would like to contribute this example if no one has objections.

 Python code example for Word2Vec in user guide
 --

 Key: SPARK-3838
 URL: https://issues.apache.org/jira/browse/SPARK-3838
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib
Reporter: Xiangrui Meng
Assignee: Liquan Pei
Priority: Trivial





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3730) Any one else having building spark recently

2014-10-01 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156082#comment-14156082
 ] 

Anant Daksh Asthana commented on SPARK-3730:


Thanks Patrick.

On Wed, Oct 1, 2014 at 6:09 PM, Patrick Wendell (JIRA) j...@apache.org



 Any one else having building spark recently
 ---

 Key: SPARK-3730
 URL: https://issues.apache.org/jira/browse/SPARK-3730
 Project: Spark
  Issue Type: Question
Reporter: Anant Daksh Asthana
Priority: Minor

 I get an assertion error in 
 spark/core/src/main/scala/org/apache/spark/HttpServer.scala while trying to 
 build.
 I am building using 
 mvn -Pyarn -PHadoop-2.3 -DskipTests -Phive clean package
 Here is the error i get http://pastebin.com/Shi43r53



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3725) Link to building spark returns a 404

2014-09-29 Thread Anant Daksh Asthana (JIRA)
Anant Daksh Asthana created SPARK-3725:
--

 Summary: Link to building spark returns a 404
 Key: SPARK-3725
 URL: https://issues.apache.org/jira/browse/SPARK-3725
 Project: Spark
  Issue Type: Documentation
Reporter: Anant Daksh Asthana
Priority: Minor


The README.md link to Building Spark returns a 404



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3725) Link to building spark returns a 404

2014-09-29 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152047#comment-14152047
 ] 

Anant Daksh Asthana commented on SPARK-3725:


Would it make sense to add a building spark document in the repo. This will 
make it easier to find documentation and any one who has the source will have 
the docs for it as well.

 Link to building spark returns a 404
 

 Key: SPARK-3725
 URL: https://issues.apache.org/jira/browse/SPARK-3725
 Project: Spark
  Issue Type: Documentation
Reporter: Anant Daksh Asthana
Priority: Minor
   Original Estimate: 1m
  Remaining Estimate: 1m

 The README.md link to Building Spark returns a 404



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3730) Any one else having building spark recently

2014-09-29 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152269#comment-14152269
 ] 

Anant Daksh Asthana commented on SPARK-3730:


Definately not a spark issue. Just thought some one on here knew a solution.




 Any one else having building spark recently
 ---

 Key: SPARK-3730
 URL: https://issues.apache.org/jira/browse/SPARK-3730
 Project: Spark
  Issue Type: Question
Reporter: Anant Daksh Asthana
Priority: Minor

 I get an assertion error in 
 spark/core/src/main/scala/org/apache/spark/HttpServer.scala while trying to 
 build.
 I am building using 
 mvn -Pyarn -PHadoop-2.3 -DskipTests -Phive clean package
 Here is the error i get http://pastebin.com/Shi43r53



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-951) Gaussian Mixture Model

2014-09-19 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141124#comment-14141124
 ] 

Anant Daksh Asthana commented on SPARK-951:
---

caizhua Could you please elaborate a little more on the issue? right now 'This 
code' and 'input file named Gmm_spark.tbl' are unknown to me at the time of 
reading this

 Gaussian Mixture Model
 --

 Key: SPARK-951
 URL: https://issues.apache.org/jira/browse/SPARK-951
 Project: Spark
  Issue Type: Story
  Components: Examples
Affects Versions: 0.7.3
Reporter: caizhua
Priority: Critical
  Labels: Learning, Machine, Model

 This code includes the code for Gaussian Mixture Model. The input file named 
 Gmm_spark.tbl is the input for this program.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1486) Support multi-model training in MLlib

2014-09-16 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136696#comment-14136696
 ] 

Anant Daksh Asthana commented on SPARK-1486:


That sounds very true and relevant. I am completely with you on this one.

On Tue, Sep 16, 2014 at 5:50 PM, Xiangrui Meng (JIRA) j...@apache.org



 Support multi-model training in MLlib
 -

 Key: SPARK-1486
 URL: https://issues.apache.org/jira/browse/SPARK-1486
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Xiangrui Meng
Assignee: Burak Yavuz
Priority: Critical

 It is rare in practice to train just one model with a given set of 
 parameters. Usually, this is done by training multiple models with different 
 sets of parameters and then select the best based on their performance on the 
 validation set. MLlib should provide native support for multi-model 
 training/scoring. It requires decoupling of concepts like problem, 
 formulation, algorithm, parameter set, and model, which are missing in MLlib 
 now. MLI implements similar concepts, which we can borrow. There are 
 different approaches for multi-model training:
 0) Keep one copy of the data, and train models one after another (or maybe in 
 parallel, depending on the scheduler).
 1) Keep one copy of the data, and train multiple models at the same time 
 (similar to `runs` in KMeans).
 2) Make multiple copies of the data (still stored distributively), and use 
 more cores to distribute the work.
 3) Collect the data, make the entire dataset available on workers, and train 
 one or more models on each worker.
 Users should be able to choose which execution mode they want to use. Note 
 that 3) could cover many use cases in practice when the training data is not 
 huge, e.g., 1GB.
 This task will be divided into sub-tasks and this JIRA is created to discuss 
 the design and track the overall progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1945) Add full Java examples in MLlib docs

2014-06-28 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047045#comment-14047045
 ] 

Anant Daksh Asthana commented on SPARK-1945:


Just looked at the code and I agree.

 Add full Java examples in MLlib docs
 

 Key: SPARK-1945
 URL: https://issues.apache.org/jira/browse/SPARK-1945
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib
Reporter: Matei Zaharia
  Labels: Starter
 Fix For: 1.0.0


 Right now some of the Java tabs only say the following:
 All of MLlib’s methods use Java-friendly types, so you can import and call 
 them there the same way you do in Scala. The only caveat is that the methods 
 take Scala RDD objects, while the Spark Java API uses a separate JavaRDD 
 class. You can convert a Java RDD to a Scala one by calling .rdd() on your 
 JavaRDD object.
 Would be nice to translate the Scala code into Java instead.
 Also, a few pages (most notably the Matrix one) don't have Java examples at 
 all.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1945) Add full Java examples in MLlib docs

2014-06-23 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040796#comment-14040796
 ] 

Anant Daksh Asthana commented on SPARK-1945:


Michael,
This issue refers to the examples provided for using MLib in Scala and Java. 
There is a lot more examples for Scala 
(https://github.com/apache/spark/tree/master/examples/src/main/scala/org/apache/spark/examples)
vs the number of examples for 
Java(https://github.com/apache/spark/tree/master/examples/src/main/scala/org/apache/spark/examples).
 I have started tackling  a few of them and would be great if we could team up 
and work on creating examples in java as well.

 Add full Java examples in MLlib docs
 

 Key: SPARK-1945
 URL: https://issues.apache.org/jira/browse/SPARK-1945
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib
Reporter: Matei Zaharia
  Labels: Starter
 Fix For: 1.0.0


 Right now some of the Java tabs only say the following:
 All of MLlib’s methods use Java-friendly types, so you can import and call 
 them there the same way you do in Scala. The only caveat is that the methods 
 take Scala RDD objects, while the Spark Java API uses a separate JavaRDD 
 class. You can convert a Java RDD to a Scala one by calling .rdd() on your 
 JavaRDD object.
 Would be nice to translate the Scala code into Java instead.
 Also, a few pages (most notably the Matrix one) don't have Java examples at 
 all.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2198) Partition the scala build file so that it is easier to maintain

2014-06-20 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039355#comment-14039355
 ] 

Anant Daksh Asthana commented on SPARK-2198:


I am in agreement with Helena.

 Partition the scala build file so that it is easier to maintain
 ---

 Key: SPARK-2198
 URL: https://issues.apache.org/jira/browse/SPARK-2198
 Project: Spark
  Issue Type: Task
  Components: Build
Reporter: Helena Edelson
Priority: Minor
   Original Estimate: 3h
  Remaining Estimate: 3h

 Partition to standard Dependencies, Version, Settings, Publish.scala. keeping 
 the SparkBuild clean to describe the modules and their deps so that changes 
 in versions, for example, need only be made in Version.scala, settings 
 changes such as in scalac in Settings.scala, etc.
 I'd be happy to do this ([~helena_e])



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1945) Add full Java examples in MLlib docs

2014-06-15 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032121#comment-14032121
 ] 

Anant Daksh Asthana commented on SPARK-1945:


I have started writing some of these examples in Java. Will make pull requests 
on github as I test them

 Add full Java examples in MLlib docs
 

 Key: SPARK-1945
 URL: https://issues.apache.org/jira/browse/SPARK-1945
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib
Reporter: Matei Zaharia
  Labels: Starter
 Fix For: 1.0.0


 Right now some of the Java tabs only say the following:
 All of MLlib’s methods use Java-friendly types, so you can import and call 
 them there the same way you do in Scala. The only caveat is that the methods 
 take Scala RDD objects, while the Spark Java API uses a separate JavaRDD 
 class. You can convert a Java RDD to a Scala one by calling .rdd() on your 
 JavaRDD object.
 Would be nice to translate the Scala code into Java instead.
 Also, a few pages (most notably the Matrix one) don't have Java examples at 
 all.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2061) Deprecate `splits` in JavaRDDLike and add `partitions`

2014-06-12 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028874#comment-14028874
 ] 

Anant Daksh Asthana commented on SPARK-2061:


Proposed fixed can be found at : https://github.com/apache/spark/pull/1062

 Deprecate `splits` in JavaRDDLike and add `partitions`
 --

 Key: SPARK-2061
 URL: https://issues.apache.org/jira/browse/SPARK-2061
 Project: Spark
  Issue Type: Bug
  Components: Java API
Reporter: Patrick Wendell
Assignee: Anant Daksh Asthana
Priority: Minor
  Labels: starter

 Most of spark has used over to consistently using `partitions` instead of 
 `splits`. We should do likewise and add a `partitions` method to JavaRDDLike 
 and have `splits` just call that. We should also go through all cases where 
 other API's (e.g. Python) call `splits` and we should change those to use the 
 newer API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2061) Deprecate `splits` in JavaRDDLike and add `partitions`

2014-06-11 Thread Anant Daksh Asthana (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027978#comment-14027978
 ] 

Anant Daksh Asthana commented on SPARK-2061:


Could I be added as the Assignee to this task? I am currently working on a fix.

 Deprecate `splits` in JavaRDDLike and add `partitions`
 --

 Key: SPARK-2061
 URL: https://issues.apache.org/jira/browse/SPARK-2061
 Project: Spark
  Issue Type: Bug
  Components: Java API
Reporter: Patrick Wendell
Priority: Minor
  Labels: starter

 Most of spark has used over to consistently using `partitions` instead of 
 `splits`. We should do likewise and add a `partitions` method to JavaRDDLike 
 and have `splits` just call that. We should also go through all cases where 
 other API's (e.g. Python) call `splits` and we should change those to use the 
 newer API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)