Hi everybody!
This code:
DataFrame df = sqlContext.read().json(FILE_NAME);DataFrame profiles = df.select(column("_id"),struct(column("name.first").as("first_name"),column("name.last").as("last_name"),column("friends")).as("profile")).limit(1);profiles.select(column("_id"), column("profile")).toJavaRDD().collect().forEach(r -> printRowFields(r.getStruct(1))); // #1sqlContext.udf().register("schema", (UDF1<Row, Void>) r -> printRowFields(r), DataTypes.NullType); // #2profiles.select(column("_id"), callUDF("schema", column("profile"))).show();
out:
#1:
StructField(first_name,StringType,true)
StructField(last_name,StringType,true)StructField(friends,ArrayType(StructType(StructField(id,LongType,true), StructField(name,StringType,true)),true),true)
#2:
StructField(col1,StringType,true)StructField(col2,StringType,true)StructField(i[2],ArrayType(StructType(StructField(id,LongType,true), StructField(name,StringType,true)),true),true)
But why names of fields lost in UDF? What's wrong?
Best regards, Alex Chermenin.