Re: Saving a mllib model in Spark SQL
Hey, Thanks Xiangrui Meng and Cheng Lian for your valuable suggestions. It works! Divyansh Jain. On Tue, January 20, 2015 2:49 pm, Xiangrui Meng wrote: > You can save the cluster centers as a SchemaRDD of two columns (id: > Int, center: Array[Double]). When you load it back, you can construct > the k-means model from its cluster centers. -Xiangrui > > On Tue, Jan 20, 2015 at 11:55 AM, Cheng Lian > wrote: > >> This is because KMeanModel is neither a built-in type nor a user >> defined type recognized by Spark SQL. I think you can write your own UDT >> version of KMeansModel in this case. You may refer to >> o.a.s.mllib.linalg.Vector and o.a.s.mllib.linalg.VectorUDT as an >> example. >> >> Cheng >> >> >> On 1/20/15 5:34 AM, Divyansh Jain wrote: >> >> >> Hey people, >> >> >> I have run into some issues regarding saving the k-means mllib model in >> Spark SQL by converting to a schema RDD. This is what I am doing: >> >> >> case class Model(id: String, model: >> org.apache.spark.mllib.clustering.KMeansModel) ââââimport >> sqlContext.createSchemaRDD ââââval rowRdd = sc.makeRDD(Seq("id", >> model)).map(p => Model("id", model)) >> >> This is the error that I get : >> >> >> scala.MatchError: >> org.apache.spark.mllib.classification.ClassificationModel (of class >> scala.reflect.internal.Types$TypeRef$anon$6) ââat >> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflectio >> n.scala:53) >> ââat >> org.apache.spark.sql.catalyst.ScalaReflection$anonfun$schemaFor$1.apply >> (ScalaReflection.scala:64) >> ââat >> org.apache.spark.sql.catalyst.ScalaReflection$anonfun$schemaFor$1.apply >> (ScalaReflection.scala:62) >> ââat >> scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.sc >> ala:244) >> ââat >> scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.sc >> ala:244) >> ââat scala.collection.immutable.List.foreach(List.scala:318) ââat >> scala.collection.TraversableLike$class.map(TraversableLike.scala:244) >> ââat scala.collection.AbstractTraversable.map(Traversable.scala:105) >> ââat >> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflectio >> n.scala:62) >> ââat >> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflectio >> n.scala:50) >> ââat >> org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaRefle >> ction.scala:44) >> ââat >> org.apache.spark.sql.execution.ExistingRdd$.fromProductRdd(basicOperato >> rs.scala:229) >> ââat >> org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:94) >> >> >> Any help would be appreciated. Thanks! >> >> >> >> >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Saving-a-mllib-model >> -in-Spark-SQL-tp21264.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >> > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Saving a mllib model in Spark SQL
Yeah, as Michael said, I forgot that UDT is not a public API. Xiangrui's suggestion makes more sense. Cheng On 1/20/15 12:49 PM, Xiangrui Meng wrote: You can save the cluster centers as a SchemaRDD of two columns (id: Int, center: Array[Double]). When you load it back, you can construct the k-means model from its cluster centers. -Xiangrui On Tue, Jan 20, 2015 at 11:55 AM, Cheng Lian wrote: This is because KMeanModel is neither a built-in type nor a user defined type recognized by Spark SQL. I think you can write your own UDT version of KMeansModel in this case. You may refer to o.a.s.mllib.linalg.Vector and o.a.s.mllib.linalg.VectorUDT as an example. Cheng On 1/20/15 5:34 AM, Divyansh Jain wrote: Hey people, I have run into some issues regarding saving the k-means mllib model in Spark SQL by converting to a schema RDD. This is what I am doing: case class Model(id: String, model: org.apache.spark.mllib.clustering.KMeansModel) import sqlContext.createSchemaRDD val rowRdd = sc.makeRDD(Seq("id", model)).map(p => Model("id", model)) This is the error that I get : scala.MatchError: org.apache.spark.mllib.classification.ClassificationModel (of class scala.reflect.internal.Types$TypeRef$anon$6) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:53) at org.apache.spark.sql.catalyst.ScalaReflection$anonfun$schemaFor$1.apply(ScalaReflection.scala:64) at org.apache.spark.sql.catalyst.ScalaReflection$anonfun$schemaFor$1.apply(ScalaReflection.scala:62) at scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:62) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:50) at org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:44) at org.apache.spark.sql.execution.ExistingRdd$.fromProductRdd(basicOperators.scala:229) at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:94) Any help would be appreciated. Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Saving-a-mllib-model-in-Spark-SQL-tp21264.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Saving a mllib model in Spark SQL
You can save the cluster centers as a SchemaRDD of two columns (id: Int, center: Array[Double]). When you load it back, you can construct the k-means model from its cluster centers. -Xiangrui On Tue, Jan 20, 2015 at 11:55 AM, Cheng Lian wrote: > This is because KMeanModel is neither a built-in type nor a user defined > type recognized by Spark SQL. I think you can write your own UDT version of > KMeansModel in this case. You may refer to o.a.s.mllib.linalg.Vector and > o.a.s.mllib.linalg.VectorUDT as an example. > > Cheng > > On 1/20/15 5:34 AM, Divyansh Jain wrote: > > Hey people, > > I have run into some issues regarding saving the k-means mllib model in > Spark SQL by converting to a schema RDD. This is what I am doing: > > case class Model(id: String, model: > org.apache.spark.mllib.clustering.KMeansModel) > import sqlContext.createSchemaRDD > val rowRdd = sc.makeRDD(Seq("id", model)).map(p => Model("id", model)) > > This is the error that I get : > > scala.MatchError: org.apache.spark.mllib.classification.ClassificationModel > (of class scala.reflect.internal.Types$TypeRef$anon$6) > at > org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:53) > at > org.apache.spark.sql.catalyst.ScalaReflection$anonfun$schemaFor$1.apply(ScalaReflection.scala:64) > at > org.apache.spark.sql.catalyst.ScalaReflection$anonfun$schemaFor$1.apply(ScalaReflection.scala:62) > at > scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:62) > at > org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:50) > at > org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:44) > at > org.apache.spark.sql.execution.ExistingRdd$.fromProductRdd(basicOperators.scala:229) > at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:94) > > Any help would be appreciated. Thanks! > > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Saving-a-mllib-model-in-Spark-SQL-tp21264.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Saving a mllib model in Spark SQL
This is because |KMeanModel| is neither a built-in type nor a user defined type recognized by Spark SQL. I think you can write your own UDT version of |KMeansModel| in this case. You may refer to |o.a.s.mllib.linalg.Vector| and |o.a.s.mllib.linalg.VectorUDT| as an example. Cheng On 1/20/15 5:34 AM, Divyansh Jain wrote: Hey people, I have run into some issues regarding saving the k-means mllib model in Spark SQL by converting to a schema RDD. This is what I am doing: case class Model(id: String, model: org.apache.spark.mllib.clustering.KMeansModel) import sqlContext.createSchemaRDD val rowRdd = sc.makeRDD(Seq("id", model)).map(p => Model("id", model)) This is the error that I get : scala.MatchError: org.apache.spark.mllib.classification.ClassificationModel (of class scala.reflect.internal.Types$TypeRef$anon$6) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:53) at org.apache.spark.sql.catalyst.ScalaReflection$anonfun$schemaFor$1.apply(ScalaReflection.scala:64) at org.apache.spark.sql.catalyst.ScalaReflection$anonfun$schemaFor$1.apply(ScalaReflection.scala:62) at scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:62) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:50) at org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:44) at org.apache.spark.sql.execution.ExistingRdd$.fromProductRdd(basicOperators.scala:229) at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:94) Any help would be appreciated. Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Saving-a-mllib-model-in-Spark-SQL-tp21264.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Saving a mllib model in Spark SQL
Hey people, I have run into some issues regarding saving the k-means mllib model in Spark SQL by converting to a schema RDD. This is what I am doing: case class Model(id: String, model: org.apache.spark.mllib.clustering.KMeansModel) import sqlContext.createSchemaRDD val rowRdd = sc.makeRDD(Seq("id", model)).map(p => Model("id", model)) This is the error that I get : scala.MatchError: org.apache.spark.mllib.classification.ClassificationModel (of class scala.reflect.internal.Types$TypeRef$$anon$6) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:53) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:64) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:62) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:62) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:50) at org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:44) at org.apache.spark.sql.execution.ExistingRdd$.fromProductRdd(basicOperators.scala:229) at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:94) Any help would be appreciated. Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Saving-a-mllib-model-in-Spark-SQL-tp21264.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org