Hello, I am trying to train a neural net using a dataframe constructed from an RDD of LabeledPoints. The data frame's schema is:
[label: double, features: vector] The actual features values are SparseVectors. The runtime error I get when I call val labeledPoints: RDD[LabeledPoint] = <generated earlier> val fields: Seq[StructField] = List[StructField] ( StructField("label", DoubleType, nullable = false), StructField("features", VectorType, nullable = false)) val schema : StructType = StructType(fields) val labeledPointsAsRowRDD = labeledPoints.map(point => Row(point.label, point.features)) val trainingDataFrame = spark.createDataFrame(labeledPointsAsRowRDD, schema) trainer.fit(trainingDataFrame) is: org.apache.spark.mllib.linalg.SparseVector is not a valid external type for schema of vector I'm not able to figure out whether the DataFrame doesn't conform to the schema, or the schema doesn't conform to what the ml lib expects, or what. Any suggestions would be very helpful. Also, I'm confused about why the MultilayerPerceptronClassifier doesn't work directly with an RDD[LabeledPoint] as DecisionTree, RandomForest, etc do. Caused by: java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: org.apache.spark.mllib.linalg.SparseVector is not a valid external type for schema of vector validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true], top level row object), 0, label), DoubleType) AS label#0 +- validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true], top level row object), 0, label), DoubleType) +- getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true], top level row object), 0, label) +- assertnotnull(input[0, org.apache.spark.sql.Row, true], top level row object) +- input[0, org.apache.spark.sql.Row, true] newInstance(class org.apache.spark.ml.linalg.VectorUDT).serialize AS features#1 +- newInstance(class org.apache.spark.ml.linalg.VectorUDT).serialize :- newInstance(class org.apache.spark.ml.linalg.VectorUDT) +- validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true], top level row object), 1, features), org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7) +- getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true], top level row object), 1, features) +- assertnotnull(input[0, org.apache.spark.sql.Row, true], top level row object) +- input[0, org.apache.spark.sql.Row, true] -- *Pete Prokopowicz*Sr. Engineer - BEMOD! Behavioral Modeling 600 W. Chicago Ave, Chicago, IL 60654 Mobile: 708-654-8137 Groupon <http://www.google.com/url?q=http%3A%2F%2Fwww.groupon.com%2F&sa=D&sntz=1&usg=AFrqEzcC80FkwsjyolWTKAH1sZ9yU2t0xg> II Grouponworks <http://www.google.com/url?q=http%3A%2F%2Fwww.grouponworks.com%2F&sa=D&sntz=1&usg=AFrqEzdLBm3Dql75wz1BTY0mA30ov3RnWg>