Unable to pickle pySpark PipelineModel

Pralabh Kumar Thu, 10 Dec 2020 22:12:16 -0800

Hi Dev , User

I want to store spark ml model in databases , so that I can reuse them
later on .  I am
unable to pickle them . However while using scala I am able to convert them
into byte
array stream .


So for .eg I am able to do something below in scala but not in python

 val modelToByteArray = new ByteArrayOutputStream()
 val oos = new ObjectOutputStream(modelToByteArray)
 oos.writeObject(model)
 oos.close()
 oos.flush()

spark.sparkContext.parallelize(Seq((model.uid, "my-neural-network-model",
modelToByteArray.toByteArray)))
   .saveToCassandra("dfsdfs", "models", SomeColumns("uid", "name", "model")


But pickle.dumps(model) in pyspark throws error

cannot pickle '_thread.RLock' object


Please help on the same


Regards

Pralabh

Unable to pickle pySpark PipelineModel

Reply via email to