Re: Pickling Keras models for use in UDFs

2018-05-04 Thread Khaled Zaouk
Why don't you try to encapsulate your keras model within a wrapper class
(an estimator let's say), and you implement inside this wrapper class the
two functions: __getstate__ and __setstate__

On Thu, May 3, 2018 at 5:27 PM erp12  wrote:

> I would like to create a Spark UDF which returns the a prediction made
> with a
> trained Keras model. Keras models are not typically pickle-able, however I
> have used the monkey patch approach to making Keras models pickle-able, as
> described here: http://zachmoshe.com/2017/04/03/pickling-keras-models.html
>
> This allows for models to be sent from the PySpark driver to the workers,
> however the worker python processes do not have the monkey patched Model
> class, and thus cannot properly un-pickle the models. To fix this issue, I
> know I must call the monkey patching function (make_keras_picklable()) once
> on each worker, however I have been unable to figure out how to do this.
>
> I am curious to hear if anyone has a fix for this issue, or would like to
> offer an alternative way to make predictions with a Keras model within a
> Spark UDF.
>
> Here is a Stack Overflow question with more details:
>
> https://stackoverflow.com/questions/50007126/pickling-monkey-patched-keras-model-for-use-in-pyspark
>
> Thank you!
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


[Spark Streaming]: Does DStream workload run over Spark SQL engine?

2018-05-02 Thread Khaled Zaouk
Hi,

I have a question regarding the execution engine of Spark Streaming
(DStream API): Does Spark streaming jobs run over the Spark SQL engine?

For example, if I change a configuration parameter related to Spark SQL
(like spark.sql.streaming.minBatchesToRetain or
spark.sql.objectHashAggregate.sortBased.fallbackThreshold), does this
make any difference when I run Spark streaming job (using DStream API)?

Thank you!

Khaled