[jira] [Assigned] (SPARK-23841) NodeIdCache should unpersist the last cached nodeIdsForInstances
[ https://issues.apache.org/jira/browse/SPARK-23841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-23841: - Assignee: zhengruifeng > NodeIdCache should unpersist the last cached nodeIdsForInstances > > > Key: SPARK-23841 > URL: https://issues.apache.org/jira/browse/SPARK-23841 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.4.0 >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Minor > > NodeIdCache forget to unpersist the last cached intermediate dataset: > > {code:java} > scala> import org.apache.spark.ml.classification._ > import org.apache.spark.ml.classification._ > scala> val df = > spark.read.format("libsvm").load("/Users/zrf/Dev/OpenSource/spark/data/mllib/sample_libsvm_data.txt") > 2018-04-02 11:48:25 WARN LibSVMFileFormat:66 - 'numFeatures' option not > specified, determining the number of features by going though the input. If > you know the number in advance, please specify it via 'numFeatures' option to > avoid the extra scan. > 2018-04-02 11:48:31 WARN ObjectStore:568 - Failed to get database > global_temp, returning NoSuchObjectException > df: org.apache.spark.sql.DataFrame = [label: double, features: vector] > scala> val rf = new RandomForestClassifier().setCacheNodeIds(true) > rf: org.apache.spark.ml.classification.RandomForestClassifier = > rfc_aab2b672546b > scala> val rfm = rf.fit(df) > rfm: org.apache.spark.ml.classification.RandomForestClassificationModel = > RandomForestClassificationModel (uid=rfc_aab2b672546b) with 20 trees > scala> sc.getPersistentRDDs > res0: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(56 -> > MapPartitionsRDD[56] at map at NodeIdCache.scala:102){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-23841) NodeIdCache should unpersist the last cached nodeIdsForInstances
[ https://issues.apache.org/jira/browse/SPARK-23841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23841: Assignee: Apache Spark > NodeIdCache should unpersist the last cached nodeIdsForInstances > > > Key: SPARK-23841 > URL: https://issues.apache.org/jira/browse/SPARK-23841 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.4.0 >Reporter: zhengruifeng >Assignee: Apache Spark >Priority: Minor > > NodeIdCache forget to unpersist the last cached intermediate dataset: > > {code:java} > scala> import org.apache.spark.ml.classification._ > import org.apache.spark.ml.classification._ > scala> val df = > spark.read.format("libsvm").load("/Users/zrf/Dev/OpenSource/spark/data/mllib/sample_libsvm_data.txt") > 2018-04-02 11:48:25 WARN LibSVMFileFormat:66 - 'numFeatures' option not > specified, determining the number of features by going though the input. If > you know the number in advance, please specify it via 'numFeatures' option to > avoid the extra scan. > 2018-04-02 11:48:31 WARN ObjectStore:568 - Failed to get database > global_temp, returning NoSuchObjectException > df: org.apache.spark.sql.DataFrame = [label: double, features: vector] > scala> val rf = new RandomForestClassifier().setCacheNodeIds(true) > rf: org.apache.spark.ml.classification.RandomForestClassifier = > rfc_aab2b672546b > scala> val rfm = rf.fit(df) > rfm: org.apache.spark.ml.classification.RandomForestClassificationModel = > RandomForestClassificationModel (uid=rfc_aab2b672546b) with 20 trees > scala> sc.getPersistentRDDs > res0: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(56 -> > MapPartitionsRDD[56] at map at NodeIdCache.scala:102){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-23841) NodeIdCache should unpersist the last cached nodeIdsForInstances
[ https://issues.apache.org/jira/browse/SPARK-23841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23841: Assignee: (was: Apache Spark) > NodeIdCache should unpersist the last cached nodeIdsForInstances > > > Key: SPARK-23841 > URL: https://issues.apache.org/jira/browse/SPARK-23841 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.4.0 >Reporter: zhengruifeng >Priority: Minor > > NodeIdCache forget to unpersist the last cached intermediate dataset: > > {code:java} > scala> import org.apache.spark.ml.classification._ > import org.apache.spark.ml.classification._ > scala> val df = > spark.read.format("libsvm").load("/Users/zrf/Dev/OpenSource/spark/data/mllib/sample_libsvm_data.txt") > 2018-04-02 11:48:25 WARN LibSVMFileFormat:66 - 'numFeatures' option not > specified, determining the number of features by going though the input. If > you know the number in advance, please specify it via 'numFeatures' option to > avoid the extra scan. > 2018-04-02 11:48:31 WARN ObjectStore:568 - Failed to get database > global_temp, returning NoSuchObjectException > df: org.apache.spark.sql.DataFrame = [label: double, features: vector] > scala> val rf = new RandomForestClassifier().setCacheNodeIds(true) > rf: org.apache.spark.ml.classification.RandomForestClassifier = > rfc_aab2b672546b > scala> val rfm = rf.fit(df) > rfm: org.apache.spark.ml.classification.RandomForestClassificationModel = > RandomForestClassificationModel (uid=rfc_aab2b672546b) with 20 trees > scala> sc.getPersistentRDDs > res0: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(56 -> > MapPartitionsRDD[56] at map at NodeIdCache.scala:102){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org