[ 
https://issues.apache.org/jira/browse/SPARK-23841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23841:
------------------------------------

    Assignee: Apache Spark

> NodeIdCache should unpersist the last cached nodeIdsForInstances
> ----------------------------------------------------------------
>
>                 Key: SPARK-23841
>                 URL: https://issues.apache.org/jira/browse/SPARK-23841
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.4.0
>            Reporter: zhengruifeng
>            Assignee: Apache Spark
>            Priority: Minor
>
> {{{{NodeIdCache}}}} forget to unpersist the last cached intermediate dataset:
>  
> {code:java}
> scala> import org.apache.spark.ml.classification._
> import org.apache.spark.ml.classification._
> scala> val df = 
> spark.read.format("libsvm").load("/Users/zrf/Dev/OpenSource/spark/data/mllib/sample_libsvm_data.txt")
> 2018-04-02 11:48:25 WARN  LibSVMFileFormat:66 - 'numFeatures' option not 
> specified, determining the number of features by going though the input. If 
> you know the number in advance, please specify it via 'numFeatures' option to 
> avoid the extra scan.
> 2018-04-02 11:48:31 WARN  ObjectStore:568 - Failed to get database 
> global_temp, returning NoSuchObjectException
> df: org.apache.spark.sql.DataFrame = [label: double, features: vector]
> scala> val rf = new RandomForestClassifier().setCacheNodeIds(true)
> rf: org.apache.spark.ml.classification.RandomForestClassifier = 
> rfc_aab2b672546b
> scala> val rfm = rf.fit(df)
> rfm: org.apache.spark.ml.classification.RandomForestClassificationModel = 
> RandomForestClassificationModel (uid=rfc_aab2b672546b) with 20 trees
> scala> sc.getPersistentRDDs
> res0: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(56 -> 
> MapPartitionsRDD[56] at map at NodeIdCache.scala:102){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to