Hi,
Previous we have applied SVM algorithm in MLlib to 5 million records (600
mb), it takes more than 25 minutes to finish.
The spark version we are using is 1.0 and we were running this program on a
4 nodes cluster. Each node has 4 cpu cores and 11 GB RAM.
The 5 million records only have two
Watch the app manager it should tell you what's running and taking awhile... My
guess it's a distinct function on the data.
J
Sent from my iPhone
On Oct 30, 2014, at 8:22 AM, peng xia toxiap...@gmail.com wrote:
Hi,
Previous we have applied SVM algorithm in MLlib to 5 million records
DId you cache the data and check the load balancing? How many
features? Which API are you using, Scala, Java, or Python? -Xiangrui
On Thu, Oct 30, 2014 at 9:13 AM, Jimmy ji...@sellpoints.com wrote:
Watch the app manager it should tell you what's running and taking awhile...
My guess it's a
Thanks for all your help.
I think I didn't cache the data. My previous cluster was expired and I
don't have a chance to check the load balance or app manager.
Below is my code.
There are 18 features for each record and I am using the Scala API.
import org.apache.spark.SparkConf
import
Then caching should solve the problem. Otherwise, it is just loading
and parsing data from disk for each iteration. -Xiangrui
On Thu, Oct 30, 2014 at 11:44 AM, peng xia toxiap...@gmail.com wrote:
Thanks for all your help.
I think I didn't cache the data. My previous cluster was expired and I
Hi Xiangrui,
Can you give me some code example about caching, as I am new to Spark.
Thanks,
Best,
Peng
On Thu, Oct 30, 2014 at 6:57 PM, Xiangrui Meng men...@gmail.com wrote:
Then caching should solve the problem. Otherwise, it is just loading
and parsing data from disk for each iteration.
sampleRDD. cache()
Sent from my iPhone
On Oct 30, 2014, at 5:01 PM, peng xia toxiap...@gmail.com wrote:
Hi Xiangrui,
Can you give me some code example about caching, as I am new to Spark.
Thanks,
Best,
Peng
On Thu, Oct 30, 2014 at 6:57 PM, Xiangrui Meng men...@gmail.com wrote:
Thanks Jimmy.
I will have a try.
Thanks very much for your guys' help.
Best,
Peng
On Thu, Oct 30, 2014 at 8:19 PM, Jimmy ji...@sellpoints.com wrote:
sampleRDD. cache()
Sent from my iPhone
On Oct 30, 2014, at 5:01 PM, peng xia toxiap...@gmail.com wrote:
Hi Xiangrui,
Can you give me some