Re: Lost names of struct fields in UDF

2016-05-16 Thread Alexander Chermenin
Hi. It's surprisingly, but this code solves my problem: private static Column namedStruct(Column... cols) {    List<_expression_> exprs = Arrays.stream(cols)    .flatMap(c ->    Stream.of(    new Literal(UTF8String.fromString(((NamedExpression) c.expr()).name()), DataTypes.StringType),    c.expr()    )    )    .collect(Collectors.toList());    return new Column(new CreateNamedStruct(JavaConversions.asScalaBuffer(exprs).toSeq()));} ... DataFrame profiles = df.select(        column("_id"),        namedStruct(                column("name.first").as("first_name"),                column("name.last").as("last_name"),                column("friends")        ).as("profile"))... Didn't go deep and wasn't looking for any reasons of the problem. Best regards, Alexander Chermenin.Web: http://chermenin.ruMail: a...@chermenin.ru   06.05.2016, 14:19, "Alexander Chermenin" :Hi everybody! This code: DataFrame df = sqlContext.read().json(FILE_NAME); DataFrame profiles = df.select(        column("_id"),        struct(                column("name.first").as("first_name"),                column("name.last").as("last_name"),                column("friends")        ).as("profile")).limit(1); profiles.select(column("_id"), column("profile")).toJavaRDD().collect().forEach(r -> printRowFields(r.getStruct(1))); // #1 sqlContext.udf().register("schema", (UDF1) r -> printRowFields(r), DataTypes.NullType); // #2profiles.select(column("_id"), callUDF("schema", column("profile"))).show(); out: #1:StructField(first_name,StringType,true)StructField(last_name,StringType,true)StructField(friends,ArrayType(StructType(StructField(id,LongType,true), StructField(name,StringType,true)),true),true)#2:StructField(col1,StringType,true)StructField(col2,StringType,true)StructField(i[2],ArrayType(StructType(StructField(id,LongType,true), StructField(name,StringType,true)),true),true) But why names of fields lost in UDF? What's wrong? Best regards, Alex Chermenin. 

Lost names of struct fields in UDF

2016-05-06 Thread Alexander Chermenin
Hi everybody! This code: DataFrame df = sqlContext.read().json(FILE_NAME); DataFrame profiles = df.select(        column("_id"),        struct(                column("name.first").as("first_name"),                column("name.last").as("last_name"),                column("friends")        ).as("profile")).limit(1); profiles.select(column("_id"), column("profile")).toJavaRDD().collect().forEach(r -> printRowFields(r.getStruct(1))); // #1 sqlContext.udf().register("schema", (UDF1) r -> printRowFields(r), DataTypes.NullType); // #2profiles.select(column("_id"), callUDF("schema", column("profile"))).show(); out: #1:StructField(first_name,StringType,true)StructField(last_name,StringType,true)StructField(friends,ArrayType(StructType(StructField(id,LongType,true), StructField(name,StringType,true)),true),true)#2:StructField(col1,StringType,true)StructField(col2,StringType,true)StructField(i[2],ArrayType(StructType(StructField(id,LongType,true), StructField(name,StringType,true)),true),true) But why names of fields lost in UDF? What's wrong? Best regards, Alex Chermenin.