Hi,
I create a dataframe using a schema, but when I try to create a model, I
receive  this error:

requirement failed: Column features must be of type
org.apache.spark.mllib.linalg.VectorUDT@f71b0bce but was actually
ArrayType(StringType,true)



#### piece of code ####

SQLContext sqlContext = SQLContext.getOrCreate(rdd.context());

        StructType schema = DataTypes
            .createStructType(new StructField[] {
                DataTypes.createStructField("id", DataTypes.StringType,
false),
                DataTypes.createStructField("date", DataTypes.StringType,
false),
                DataTypes.createStructField("temperature",
DataTypes.StringType, true),);


// I receive data from another application like this id,date,temperature
        JavaRDD<Row> rowsRdd = rdd.map(e ->
RowFactory.create(e.split(",")));

        DataFrame df = sqlContext.createDataFrame(rowsRdd, schema);


        LinearRegression lr = new LinearRegression()
        .setMaxIter(10)
        .setRegParam(0.3)
        .setElasticNetParam(0.8);

        Tokenizer tokenizer = new
Tokenizer().setInputCol("temperature").setOutputCol("features");
// the problem is here :
        DataFrame result = tokenizer.transform(df);

        LinearRegressionModel lrModel = lr.fit(result);
######################
And I don't know how can I do this and why I need label field if I tried to
transform column features into vector ?

thank you in advance
regards,
Zakaria
ᐧ

Reply via email to