canopy clustering

2014-11-09 Thread aminn_524
I want to run k-means of MLib on a big dataset, it seems for big datsets, we need to perform pre-clustering methods such as canopy clustering. By starting with an initial clustering the number of more expensive distance measurements can be significantly reduced by ignoring points outside of the

how to host the drive node

2014-07-09 Thread aminn_524
I have one master and two slave nodes, I did not set any ip for spark driver. My question is should I set a ip for spark driver and can I host the driver inside the cluster in master node? if so, how to host it? will it be hosted automatically in that node we submit the application by

Re: Why doesn't the driver node do any work?

2014-07-09 Thread aminn_524
I have one master and two slave nodes, I did not set any ip for spark driver. My question is should I set a ip for spark driver and can I host the driver inside the cluster in master node? if so, how to host it? will it be hosted automatically in that node we submit the application by

sparck Stdout and stderr

2014-07-04 Thread aminn_524
0 down vote favorite I am running spark-1.0.0 by connecting to a spark standalone cluster which has one master and two slaves. I ran wordcount.py by Spark-submit, actually it reads data from HDFS and also write the results into HDFS. So far everything is fine and the results will correctly be

difference between worker and slave nodes

2014-07-01 Thread aminn_524
Can anyone explain to me what is difference between worker and slave? I hav e one master and two slaves which are connected to each other, by using jps command I can see master in master node and worker in slave nodes but I dont have any worker in my master by using this command