Hi everybody!
 
This code:
 
DataFrame df = sqlContext.read().json(FILE_NAME);
 
DataFrame profiles = df.select(
        column("_id"),
        struct(
                column("name.first").as("first_name"),
                column("name.last").as("last_name"),
                column("friends")
        ).as("profile")
).limit(1);
 
profiles.select(column("_id"), column("profile")).toJavaRDD().collect().forEach(r -> printRowFields(r.getStruct(1))); // #1
 
sqlContext.udf().register("schema", (UDF1<Row, Void>) r -> printRowFields(r), DataTypes.NullType); // #2
profiles.select(column("_id"), callUDF("schema", column("profile"))).show();
 
out:
 
#1:

StructField(first_name,StringType,true)
StructField(last_name,StringType,true)
StructField(friends,ArrayType(StructType(StructField(id,LongType,true), StructField(name,StringType,true)),true),true)

#2:

StructField(col1,StringType,true)
StructField(col2,StringType,true)
StructField(i[2],ArrayType(StructType(StructField(id,LongType,true), StructField(name,StringType,true)),true),true)
 
But why names of fields lost in UDF? What's wrong?
 
Best regards, Alex Chermenin.

 

Reply via email to