Hi, I create a dataframe using a schema, but when I try to create a model, I receive this error:
requirement failed: Column features must be of type org.apache.spark.mllib.linalg.VectorUDT@f71b0bce but was actually ArrayType(StringType,true) #### piece of code #### SQLContext sqlContext = SQLContext.getOrCreate(rdd.context()); StructType schema = DataTypes .createStructType(new StructField[] { DataTypes.createStructField("id", DataTypes.StringType, false), DataTypes.createStructField("date", DataTypes.StringType, false), DataTypes.createStructField("temperature", DataTypes.StringType, true),); // I receive data from another application like this id,date,temperature JavaRDD<Row> rowsRdd = rdd.map(e -> RowFactory.create(e.split(","))); DataFrame df = sqlContext.createDataFrame(rowsRdd, schema); LinearRegression lr = new LinearRegression() .setMaxIter(10) .setRegParam(0.3) .setElasticNetParam(0.8); Tokenizer tokenizer = new Tokenizer().setInputCol("temperature").setOutputCol("features"); // the problem is here : DataFrame result = tokenizer.transform(df); LinearRegressionModel lrModel = lr.fit(result); ###################### And I don't know how can I do this and why I need label field if I tried to transform column features into vector ? thank you in advance regards, Zakaria ᐧ