Why not you just return the struct you defined, instead of an array?

            @Override
            public Row call(Double x, Double y) throws Exception {
                Row row = RowFactory.create(x, y);
                return row;
            }


________________________________
From: Richard Xin <richardxin...@yahoo.com>
Sent: Saturday, December 17, 2016 8:53 PM
To: Yong Zhang; zjp_j...@163.com; user
Subject: Re: Java to show struct field from a Dataframe

I tried to transform
root
 |-- latitude: double (nullable = false)
 |-- longitude: double (nullable = false)
 |-- name: string (nullable = true)

to:
root
 |-- name: string (nullable = true)
 |-- location: struct (nullable = true)
 |    |-- longitude: double (nullable = true)
 |    |-- latitude: double (nullable = true)

Code snippet is as followings:

        sqlContext.udf().register("toLocation", new UDF2<Double, Double, Row>() 
{
            @Override
            public Row call(Double x, Double y) throws Exception {
                Row row = RowFactory.create(new double[] { x, y });
                return row;
            }
        }, DataTypes.createStructType(new StructField[] {
                new StructField("longitude", DataTypes.DoubleType, true, 
Metadata.empty()),
                new StructField("latitude", DataTypes.DoubleType, true, 
Metadata.empty())
            }));

        DataFrame transformedDf1 = citiesDF.withColumn("location",
                callUDF("toLocation", col("longitude"), col("latitude")));
        
transformedDf1.drop("latitude").drop("longitude").schema().printTreeString();  
// prints schema tree OK as expected

        transformedDf.show();  // java.lang.ClassCastException: [D cannot be 
cast to java.lang.Double


seems to me that the ReturnType of the UDF2 might be the root cause. but not 
sure how to correct.

Thanks,
Richard




On Sunday, December 18, 2016 7:15 AM, Yong Zhang <java8...@hotmail.com> wrote:


"[D" type means a double array type. So this error simple means you have 
double[] data, but Spark needs to cast it to Double, as your schema defined.

The error message clearly indicates the data doesn't match with  the type 
specified in the schema.

I wonder how you are so sure about your data? Do you check it under other tool?

Yong


________________________________
From: Richard Xin <richardxin...@yahoo.com.INVALID>
Sent: Saturday, December 17, 2016 10:56 AM
To: zjp_j...@163.com; user
Subject: Re: Java to show struct field from a Dataframe

data is good


On Saturday, December 17, 2016 11:50 PM, "zjp_j...@163.com" <zjp_j...@163.com> 
wrote:


I think the causation is your invanlid Double data , have u checked your data ?

________________________________
zjp_j...@163.com

From: Richard Xin<mailto:richardxin...@yahoo.com.INVALID>
Date: 2016-12-17 23:28
To: User<mailto:user@spark.apache.org>
Subject: Java to show struct field from a Dataframe
let's say I have a DataFrame with schema of followings:
root
 |-- name: string (nullable = true)
 |-- location: struct (nullable = true)
 |    |-- longitude: double (nullable = true)
 |    |-- latitude: double (nullable = true)

df.show(); throws following exception:

java.lang.ClassCastException: [D cannot be cast to java.lang.Double
    at scala.runtime.BoxesRunTime.unboxToDouble(BoxesRunTime.java:119)
    at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getDouble(rows.scala:44)
    at 
org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getDouble(rows.scala:221)
    at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
....

Any advise?
Thanks in advance.
Richard




Reply via email to