PySpark pickling behavior

2017-10-11 Thread Naveen Swamy
Hello fellow users, 1) I am wondering if there is documentation or guidelines to understand in what situations does Pyspark decide to pickle the functions I use in the map method. 2) Are there best practices to avoid pickling and sharing variables, etc, I have a situation where I want to pass to

Loading objects only once

2017-09-27 Thread Naveen Swamy
Hello all, I am a new user to Spark, please bear with me if this has been discussed earlier. I am trying to run batch inference using DL frameworks pre-trained models and Spark. Basically, I want to download a model(which is usually ~500 MB) onto the workers and load the model and run inference