[jira] [Commented] (SPARK-32534) Cannot load a Pipeline Model on a stopped Spark Context

Sean R. Owen (Jira) Sun, 16 Aug 2020 10:19:58 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-32534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17178558#comment-17178558
 ]


Sean R. Owen commented on SPARK-32534:
--------------------------------------

Generally speaking, it's not going to work on stop and start a SparkContext. If 
there's some easy way to fix it, sure, but lots of things can go wrong if you 
are doing that. SparkContext lives exactly as long as the app.

> Cannot load a Pipeline Model on a stopped Spark Context
> -------------------------------------------------------
>
>                 Key: SPARK-32534
>                 URL: https://issues.apache.org/jira/browse/SPARK-32534
>             Project: Spark
>          Issue Type: Bug
>          Components: Deploy, Kubernetes
>    Affects Versions: 2.4.6
>            Reporter: Kevin Van Lieshout
>            Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I am running Spark in a Kubernetes cluster than is running Spark NLP using 
> the Pyspark ML Pipeline Model class to load the model and then transform on 
> the spark dataframe. We run this within a docker container that starts up a 
> spark context, mounts volumes, spins up executors, etc and then does it 
> transformations, udfs, etc and then closes down the spark context. The first 
> time I load the model when my service has just been started, everything is 
> fine. If I run my application for a second time without resetting my service, 
> even though the context is entirely stopped from the previous run and a new 
> one is started up, the Pipeline Model has some attribute in one of its base 
> classes that thinks the context its running on is closed, so then I get a : 
> cannot call a function on a stopped spark context when I try and load the 
> model in my service again. I have to shut down my service each time if I want 
> consecutive runs through my spark pipeline, which is not ideal, so I was 
> wondering if this was a common issue amongst fellow pyspark users that use 
> Pipeline Model, or is there a common work around to resetting all spark 
> contexts or whether the pipeline model caches a spark context of some sort. 
> Any help is very useful. 
>  
>  
> cls.pipeline = PipelineModel.read().load(NLP_MODEL)
>  
> is how I load the model. And our spark context is very similar to a typical 
> kubernetes/spark setup. Nothing special there



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32534) Cannot load a Pipeline Model on a stopped Spark Context

Reply via email to