[scikit-learn] Scikit Learn in a Cray computer

Mauricio Reis Wed, 19 Jun 2019 13:38:58 -0700

I'd like to understand how parallelism works in the DBScan routine inSciKit Learn running on the Cray computer and what should I do toimprove the results I'm looking at.

I have adapted the existing example in[https://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#sphx-glr-auto-examples-cluster-plot-dbscan-py]to run with 100,000 points and thus enable one processing time allowingreasonable evaluation of times obtained. I changed the parameter "n_jobs= x", "x" ranging from 1 to 6. I repeated several times the sameexperiments and calculated the average values of the processing time.


n_jobs  time
1       21,3
2       15,1
3       14,8
4       15,2
5       15,5
6       15,0

I then get the times that appear in the table above and in the attachedimage. As can be seen, there was only effective gain when "n_jobs = 2"and no difference for larger quantities. And yet, the gain was only lessthan 30%!!

Why were the gains so small? Why was there no greater gain for a greatervalue of the "n_jobs" parameter? Is it possible to improve the results Ihave obtained?


--
Ats.,
Mauricio Reis

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

[scikit-learn] Scikit Learn in a Cray computer

Reply via email to