Re: Python Dependencies Issue on EMR

2018-09-20 Thread Jonas Shomorony
Thanks Patrick. Using a conda virtual environment did help with libraries that required the extra C stuff. Jonas On Fri, Sep 14, 2018 at 8:02 AM Patrick McCarthy wrote: > You didn't say how you're zipping the dependencies, but I'm guessing you > either include .egg files or zipped up a

Re: Python Dependencies Issue on EMR

2018-09-14 Thread Patrick McCarthy
You didn't say how you're zipping the dependencies, but I'm guessing you either include .egg files or zipped up a virtualenv. In either case, the extra C stuff that scipy and pandas rely upon doesn't get included. An approach like this solved the last problem I had that seemed like this -

Python Dependencies Issue on EMR

2018-09-13 Thread Jonas Shomorony
Hey everyone, I am currently trying to run a Python Spark job (using YARN client mode) that uses multiple libraries, on a Spark cluster on Amazon EMR. To do that, I create a dependencies.zip file that contains all of the dependencies/libraries (installed through pip) for the job to run