[ https://issues.apache.org/jira/browse/SPARK-26326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated SPARK-26326: ------------------------------ Priority: Minor (was: Major) Yeah, this means you have a model with about 265M parameters, and when serialized as an array of bytes, is (barely) exceeding 2GB. I think reimplementing this under the hood is possible but it may call into question whether this is a realistic use case for naive bayes? > Cannot save a NaiveBayesModel with 48685 features and 5453 labels > ----------------------------------------------------------------- > > Key: SPARK-26326 > URL: https://issues.apache.org/jira/browse/SPARK-26326 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.4.0 > Reporter: Markus Paaso > Priority: Minor > > When executing > {code:java} > model.write().overwrite().save("/tmp/mymodel"){code} > The error occurs > {code:java} > java.lang.UnsupportedOperationException: Cannot convert this array to unsafe > format as it's too big. > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.fromPrimitiveArray(UnsafeArrayData.java:457) > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.fromPrimitiveArray(UnsafeArrayData.java:524) > at org.apache.spark.ml.linalg.MatrixUDT.serialize(MatrixUDT.scala:66) > at org.apache.spark.ml.linalg.MatrixUDT.serialize(MatrixUDT.scala:28) > at > org.apache.spark.sql.catalyst.CatalystTypeConverters$UDTConverter.toCatalystImpl(CatalystTypeConverters.scala:143) > at > org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103) > at > org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:258) > at > org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:238) > at > org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103) > at > org.apache.spark.sql.catalyst.CatalystTypeConverters$.$anonfun$createToCatalystConverter$2(CatalystTypeConverters.scala:396) > at > org.apache.spark.sql.catalyst.plans.logical.LocalRelation$.$anonfun$fromProduct$1(LocalRelation.scala:43) > at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233) > at scala.collection.immutable.List.foreach(List.scala:388) > at scala.collection.TraversableLike.map(TraversableLike.scala:233) > at scala.collection.TraversableLike.map$(TraversableLike.scala:226) > at scala.collection.immutable.List.map(List.scala:294) > at > org.apache.spark.sql.catalyst.plans.logical.LocalRelation$.fromProduct(LocalRelation.scala:43) > at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:315) > at > org.apache.spark.ml.classification.NaiveBayesModel$NaiveBayesModelWriter.saveImpl(NaiveBayes.scala:393) > at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:180) > {code} > Data file to reproduce the problem: > [https://github.com/make/spark-26326-files/raw/master/data.libsvm] > Code to reproduce the problem: > {code:java} > import org.apache.spark.ml.classification.NaiveBayes > import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator > // Load the data stored in LIBSVM format as a DataFrame. > val data = spark.read.format("libsvm").load("/tmp/data.libsvm") > // Train a NaiveBayes model. > val model = new NaiveBayes().fit(data) > model.write().overwrite().save("/tmp/mymodel"){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org