Re: Saving a mllib model in Spark SQL

2015-01-21 Thread Divyansh Jain
Hey,


Thanks Xiangrui Meng and Cheng Lian for your valuable suggestions.
It works!

Divyansh Jain.

On Tue, January 20, 2015 2:49 pm, Xiangrui Meng wrote:
> You can save the cluster centers as a SchemaRDD of two columns (id:
> Int, center: Array[Double]). When you load it back, you can construct
> the k-means model from its cluster centers. -Xiangrui
>
> On Tue, Jan 20, 2015 at 11:55 AM, Cheng Lian 
> wrote:
>
>> This is because KMeanModel is neither a built-in type nor a user
>> defined type recognized by Spark SQL. I think you can write your own UDT
>> version of KMeansModel in this case. You may refer to
>> o.a.s.mllib.linalg.Vector and o.a.s.mllib.linalg.VectorUDT as an
>> example.
>>
>> Cheng
>>
>>
>> On 1/20/15 5:34 AM, Divyansh Jain wrote:
>>
>>
>> Hey people,
>>
>>
>> I have run into some issues regarding saving the k-means mllib model in
>>  Spark SQL by converting to a schema RDD. This is what I am doing:
>>
>>
>> case class Model(id: String, model:
>> org.apache.spark.mllib.clustering.KMeansModel)     import
>> sqlContext.createSchemaRDD     val rowRdd = sc.makeRDD(Seq("id",
>> model)).map(p => Model("id", model))
>>
>> This is the error that I get :
>>
>>
>> scala.MatchError:
>> org.apache.spark.mllib.classification.ClassificationModel (of class
>> scala.reflect.internal.Types$TypeRef$anon$6)   at
>> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflectio
>> n.scala:53)
>>   at
>> org.apache.spark.sql.catalyst.ScalaReflection$anonfun$schemaFor$1.apply
>> (ScalaReflection.scala:64)
>>   at
>> org.apache.spark.sql.catalyst.ScalaReflection$anonfun$schemaFor$1.apply
>> (ScalaReflection.scala:62)
>>   at
>> scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.sc
>> ala:244)
>>   at
>> scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.sc
>> ala:244)
>>   at scala.collection.immutable.List.foreach(List.scala:318)   at
>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>>    at
>> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflectio
>> n.scala:62)
>>   at
>> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflectio
>> n.scala:50)
>>   at
>> org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaRefle
>> ction.scala:44)
>>   at
>> org.apache.spark.sql.execution.ExistingRdd$.fromProductRdd(basicOperato
>> rs.scala:229)
>>   at
>> org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:94)
>>
>>
>> Any help would be appreciated. Thanks!
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Saving-a-mllib-model
>> -in-Spark-SQL-tp21264.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>>
>


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Saving a mllib model in Spark SQL

2015-01-20 Thread Cheng Lian
Yeah, as Michael said, I forgot that UDT is not a public API. Xiangrui's 
suggestion makes more sense.


Cheng

On 1/20/15 12:49 PM, Xiangrui Meng wrote:

You can save the cluster centers as a SchemaRDD of two columns (id:
Int, center: Array[Double]). When you load it back, you can construct
the k-means model from its cluster centers. -Xiangrui

On Tue, Jan 20, 2015 at 11:55 AM, Cheng Lian  wrote:

This is because KMeanModel is neither a built-in type nor a user defined
type recognized by Spark SQL. I think you can write your own UDT version of
KMeansModel in this case. You may refer to o.a.s.mllib.linalg.Vector and
o.a.s.mllib.linalg.VectorUDT as an example.

Cheng

On 1/20/15 5:34 AM, Divyansh Jain wrote:

Hey people,

I have run into some issues regarding saving the k-means mllib model in
Spark SQL by converting to a schema RDD. This is what I am doing:

case class Model(id: String, model:
org.apache.spark.mllib.clustering.KMeansModel)
import sqlContext.createSchemaRDD
val rowRdd = sc.makeRDD(Seq("id", model)).map(p => Model("id", model))

This is the error that I get :

scala.MatchError: org.apache.spark.mllib.classification.ClassificationModel
(of class scala.reflect.internal.Types$TypeRef$anon$6)
  at
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:53)
  at
org.apache.spark.sql.catalyst.ScalaReflection$anonfun$schemaFor$1.apply(ScalaReflection.scala:64)
  at
org.apache.spark.sql.catalyst.ScalaReflection$anonfun$schemaFor$1.apply(ScalaReflection.scala:62)
  at
scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:244)
  at
scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:244)
  at scala.collection.immutable.List.foreach(List.scala:318)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
  at scala.collection.AbstractTraversable.map(Traversable.scala:105)
  at
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:62)
  at
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:50)
  at
org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:44)
  at
org.apache.spark.sql.execution.ExistingRdd$.fromProductRdd(basicOperators.scala:229)
  at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:94)

Any help would be appreciated. Thanks!







--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Saving-a-mllib-model-in-Spark-SQL-tp21264.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org





-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Saving a mllib model in Spark SQL

2015-01-20 Thread Xiangrui Meng
You can save the cluster centers as a SchemaRDD of two columns (id:
Int, center: Array[Double]). When you load it back, you can construct
the k-means model from its cluster centers. -Xiangrui

On Tue, Jan 20, 2015 at 11:55 AM, Cheng Lian  wrote:
> This is because KMeanModel is neither a built-in type nor a user defined
> type recognized by Spark SQL. I think you can write your own UDT version of
> KMeansModel in this case. You may refer to o.a.s.mllib.linalg.Vector and
> o.a.s.mllib.linalg.VectorUDT as an example.
>
> Cheng
>
> On 1/20/15 5:34 AM, Divyansh Jain wrote:
>
> Hey people,
>
> I have run into some issues regarding saving the k-means mllib model in
> Spark SQL by converting to a schema RDD. This is what I am doing:
>
> case class Model(id: String, model:
> org.apache.spark.mllib.clustering.KMeansModel)
> import sqlContext.createSchemaRDD
> val rowRdd = sc.makeRDD(Seq("id", model)).map(p => Model("id", model))
>
> This is the error that I get :
>
> scala.MatchError: org.apache.spark.mllib.classification.ClassificationModel
> (of class scala.reflect.internal.Types$TypeRef$anon$6)
>   at
> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:53)
>   at
> org.apache.spark.sql.catalyst.ScalaReflection$anonfun$schemaFor$1.apply(ScalaReflection.scala:64)
>   at
> org.apache.spark.sql.catalyst.ScalaReflection$anonfun$schemaFor$1.apply(ScalaReflection.scala:62)
>   at
> scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:244)
>   at
> scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at
> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:62)
>   at
> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:50)
>   at
> org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:44)
>   at
> org.apache.spark.sql.execution.ExistingRdd$.fromProductRdd(basicOperators.scala:229)
>   at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:94)
>
> Any help would be appreciated. Thanks!
>
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Saving-a-mllib-model-in-Spark-SQL-tp21264.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Saving a mllib model in Spark SQL

2015-01-20 Thread Cheng Lian
This is because |KMeanModel| is neither a built-in type nor a user 
defined type recognized by Spark SQL. I think you can write your own UDT 
version of |KMeansModel| in this case. You may refer to 
|o.a.s.mllib.linalg.Vector| and |o.a.s.mllib.linalg.VectorUDT| as an 
example.


Cheng

On 1/20/15 5:34 AM, Divyansh Jain wrote:


Hey people,

I have run into some issues regarding saving the k-means mllib model in
Spark SQL by converting to a schema RDD. This is what I am doing:

case class Model(id: String, model:
org.apache.spark.mllib.clustering.KMeansModel)
import sqlContext.createSchemaRDD
val rowRdd = sc.makeRDD(Seq("id", model)).map(p => Model("id", model))

This is the error that I get :

scala.MatchError: org.apache.spark.mllib.classification.ClassificationModel
(of class scala.reflect.internal.Types$TypeRef$anon$6)
  at
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:53)
  at
org.apache.spark.sql.catalyst.ScalaReflection$anonfun$schemaFor$1.apply(ScalaReflection.scala:64)
  at
org.apache.spark.sql.catalyst.ScalaReflection$anonfun$schemaFor$1.apply(ScalaReflection.scala:62)
  at
scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:244)
  at
scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:244)
  at scala.collection.immutable.List.foreach(List.scala:318)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
  at scala.collection.AbstractTraversable.map(Traversable.scala:105)
  at
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:62)
  at
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:50)
  at
org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:44)
  at
org.apache.spark.sql.execution.ExistingRdd$.fromProductRdd(basicOperators.scala:229)
  at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:94)

Any help would be appreciated. Thanks!







--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Saving-a-mllib-model-in-Spark-SQL-tp21264.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



​


Saving a mllib model in Spark SQL

2015-01-20 Thread Divyansh Jain
Hey people,

I have run into some issues regarding saving the k-means mllib model in
Spark SQL by converting to a schema RDD. This is what I am doing:

case class Model(id: String, model:
org.apache.spark.mllib.clustering.KMeansModel)
import sqlContext.createSchemaRDD
val rowRdd = sc.makeRDD(Seq("id", model)).map(p => Model("id", model))

This is the error that I get :

scala.MatchError: org.apache.spark.mllib.classification.ClassificationModel
(of class scala.reflect.internal.Types$TypeRef$$anon$6)
  at
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:53)
  at
org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:64)
  at
org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:62)
  at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
  at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
  at scala.collection.immutable.List.foreach(List.scala:318)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
  at scala.collection.AbstractTraversable.map(Traversable.scala:105)
  at
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:62)
  at
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:50)
  at
org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:44)
  at
org.apache.spark.sql.execution.ExistingRdd$.fromProductRdd(basicOperators.scala:229)
  at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:94)

Any help would be appreciated. Thanks!







--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Saving-a-mllib-model-in-Spark-SQL-tp21264.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org