issue on applying SVM to 5 million examples.

2014-10-30 Thread peng xia
Hi, Previous we have applied SVM algorithm in MLlib to 5 million records (600 mb), it takes more than 25 minutes to finish. The spark version we are using is 1.0 and we were running this program on a 4 nodes cluster. Each node has 4 cpu cores and 11 GB RAM. The 5 million records only have two

Re: issue on applying SVM to 5 million examples.

2014-10-30 Thread Jimmy
Watch the app manager it should tell you what's running and taking awhile... My guess it's a distinct function on the data. J Sent from my iPhone On Oct 30, 2014, at 8:22 AM, peng xia toxiap...@gmail.com wrote: Hi, Previous we have applied SVM algorithm in MLlib to 5 million records

Re: issue on applying SVM to 5 million examples.

2014-10-30 Thread Xiangrui Meng
DId you cache the data and check the load balancing? How many features? Which API are you using, Scala, Java, or Python? -Xiangrui On Thu, Oct 30, 2014 at 9:13 AM, Jimmy ji...@sellpoints.com wrote: Watch the app manager it should tell you what's running and taking awhile... My guess it's a

Re: issue on applying SVM to 5 million examples.

2014-10-30 Thread peng xia
Thanks for all your help. I think I didn't cache the data. My previous cluster was expired and I don't have a chance to check the load balance or app manager. Below is my code. There are 18 features for each record and I am using the Scala API. import org.apache.spark.SparkConf import

Re: issue on applying SVM to 5 million examples.

2014-10-30 Thread Xiangrui Meng
Then caching should solve the problem. Otherwise, it is just loading and parsing data from disk for each iteration. -Xiangrui On Thu, Oct 30, 2014 at 11:44 AM, peng xia toxiap...@gmail.com wrote: Thanks for all your help. I think I didn't cache the data. My previous cluster was expired and I

Re: issue on applying SVM to 5 million examples.

2014-10-30 Thread peng xia
Hi Xiangrui, Can you give me some code example about caching, as I am new to Spark. Thanks, Best, Peng On Thu, Oct 30, 2014 at 6:57 PM, Xiangrui Meng men...@gmail.com wrote: Then caching should solve the problem. Otherwise, it is just loading and parsing data from disk for each iteration.

Re: issue on applying SVM to 5 million examples.

2014-10-30 Thread Jimmy
sampleRDD. cache() Sent from my iPhone On Oct 30, 2014, at 5:01 PM, peng xia toxiap...@gmail.com wrote: Hi Xiangrui, Can you give me some code example about caching, as I am new to Spark. Thanks, Best, Peng On Thu, Oct 30, 2014 at 6:57 PM, Xiangrui Meng men...@gmail.com wrote:

Re: issue on applying SVM to 5 million examples.

2014-10-30 Thread peng xia
Thanks Jimmy. I will have a try. Thanks very much for your guys' help. Best, Peng On Thu, Oct 30, 2014 at 8:19 PM, Jimmy ji...@sellpoints.com wrote: sampleRDD. cache() Sent from my iPhone On Oct 30, 2014, at 5:01 PM, peng xia toxiap...@gmail.com wrote: Hi Xiangrui, Can you give me some