Re: MLLib: LinearRegressionWithSGD performance

Jayant Shekhar Fri, 21 Nov 2014 12:13:17 -0800

Hi Sameer,

You can also use repartition to create a higher number of tasks.


-Jayant


On Fri, Nov 21, 2014 at 12:02 PM, Jayant Shekhar <jay...@cloudera.com>
wrote:

> Hi Sameer,
>
> You can try increasing the number of executor-cores.
>
> -Jayant
>
>
>
>
>
> On Fri, Nov 21, 2014 at 11:18 AM, Sameer Tilak <ssti...@live.com> wrote:
>
>> Hi All,
>> I have been using MLLib's linear regression and I have some question
>> regarding the performance. We have a cluster of 10 nodes -- each node has
>> 24 cores and 148GB memory. I am running my app as follows:
>>
>> time spark-submit --class medslogistic.MedsLogistic --master yarn-client
>> --executor-memory 6G --num-executors 10 /pathtomyapp/myapp.jar
>>
>> I am also going to play with number of executors (reduce it) may be that
>> will give us different results.
>>
>> The input is a 800MB sparse file in LibSVNM format. Total number of
>> features is 150K. It takes approximately 70 minutes for the regression to
>> finish. The job imposes very little load on CPU, memory, network, and disk. 
>> Total
>> number of tasks is 104.  Total time gets divided fairly uniformly across
>> these tasks each task. I was wondering, is it possible to reduce the
>> execution time further?
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>
>

Re: MLLib: LinearRegressionWithSGD performance

Reply via email to