Awesome! Thanks!
*Sebastián Ramírez*
Head of Software Development
http://www.senseta.com
Tel: (+571) 795 7950 ext: 1012
Cel: (+57) 300 370 77 10
Calle 73 No 7 - 06 Piso 4
Linkedin: co.linkedin.com/in/tiangolo/
Twitter: @tiangolo https://twitter.com/tiangolo
Email
a *simple script which helps installing Anaconda Python in the
machines of a cluster *more easily.
I wanted to share it here, in case it can help someone wanting using
PySpark.
https://github.com/tiangolo/anaconda_cluster_install
*Sebastián Ramírez*
Head of Software Development
http
Great to know, thanks Xiangrui.
*Sebastián Ramírez*
Diseñador de Algoritmos
http://www.senseta.com
Tel: (+571) 795 7950 ext: 1012
Cel: (+57) 300 370 77 10
Calle 73 No 7 - 06 Piso 4
Linkedin: co.linkedin.com/in/tiangolo/
Twitter: @tiangolo https://twitter.com/tiangolo
a terminal
Ctrl+Alt+F1
# Shutdown the GUI
sudo stop lightdm
(for reference: http://askubuntu.com/questions/148321/how-do-i-stop-gui)
*Sebastián Ramírez*
Diseñador de Algoritmos
http://www.senseta.com
Tel: (+571) 795 7950 ext: 1012
Cel: (+57) 300 370 77 10
Calle 73 No 7 - 06 Piso 4
in pseudo-code that you can save to a file. Then,
you can parse that pseudo code to write a proper script that runs the
Decision Tree. Actually, that's what I did for a Random Forest (an ensamble
of Decision Trees).
Hope that helps,
*Sebastián Ramírez*
Diseñador de Algoritmos
http://www.senseta.com
/Anaconda-2.1.0-Linux-x86_64.sh
# Or the current link for the moment you are doing it:
https://store.continuum.io/cshop/anaconda/
bash Anaconda*.sh
# When asked if set it as the default Python, or to add Anaconda to the
PATH (I don't remember how they say it), choose yes
I hope that helps,
*Sebastián
that helps.
Best,
*Sebastián Ramírez*
Diseñador de Algoritmos
http://www.senseta.com
Tel: (+571) 795 7950 ext: 1012
Cel: (+57) 300 370 77 10
Calle 73 No 7 - 06 Piso 4
Linkedin: co.linkedin.com/in/tiangolo/
Twitter: @tiangolo https://twitter.com/tiangolo
Email: sebastian.rami
, and aren't applied until they are needed by an
action (and, to me, it happend for readings too some time ago).
You can try calling a .first() in your RDD from once in a while to force it
to load the RDD to your cluster (but it might not be the cleanest way to do
it).
*Sebastián Ramírez*
Diseñador