Sorry Devender, I hit the send button sooner by mistake. I meant to add more info.
So what I was trying to say was that you can use withColumn with when/otherwise clauses to add a column conditionally. See an example here: https://stackoverflow.com/questions/34908448/spark-add-column-to-dataframe-conditionally On Tue, Dec 18, 2018 at 9:55 AM Shahab Yunus <shahab.yu...@gmail.com> wrote: > Have you tried using withColumn? You can add a boolean column based on > whether the age exists or not and then drop the older age column. You > wouldn't need union of dataframes then > > On Tue, Dec 18, 2018 at 8:58 AM Devender Yadav < > devender.ya...@impetus.co.in> wrote: > >> Hi All, >> >> >> useful code: >> >> public class EmployeeBean implements Serializable { >> >> private Long id; >> >> private String name; >> >> private Long salary; >> >> private Integer age; >> >> // getters and setters >> >> } >> >> >> Relevant spark code: >> >> SparkSession spark = >> SparkSession.builder().master("local[2]").appName("play-with-spark").getOrCreate(); >> List<EmployeeBean> employees1 = populateEmployees(1, 10); >> >> Dataset<EmployeeBean> ds1 = spark.createDataset(employees1, >> Encoders.bean(EmployeeBean.class)); >> ds1.show(); >> ds1.printSchema(); >> >> Dataset<Row> ds2 = ds1.where("age is null").withColumn("is_age_null", >> lit(true)); >> Dataset<Row> ds3 = ds1.where("age is not null").withColumn("is_age_null", >> lit(false)); >> >> Dataset<Row> ds4 = ds2.union(ds3); >> ds4.show(); >> >> >> Relevant Output: >> >> >> ds1 >> >> +----+---+----+------+ >> | age| id|name|salary| >> +----+---+----+------+ >> |null| 1|dev1| 11000| >> | 2| 2|dev2| 12000| >> |null| 3|dev3| 13000| >> | 4| 4|dev4| 14000| >> |null| 5|dev5| 15000| >> +----+---+----+------+ >> >> >> ds4 >> >> +----+---+----+------+-----------+ >> | age| id|name|salary|is_age_null| >> +----+---+----+------+-----------+ >> |null| 1|dev1| 11000| true| >> |null| 3|dev3| 13000| true| >> |null| 5|dev5| 15000| true| >> | 2| 2|dev2| 12000| false| >> | 4| 4|dev4| 14000| false| >> +----+---+----+------+-----------+ >> >> >> Is there any better solution to add this column in the dataset rather >> than creating two datasets and performing union? >> >> < >> https://stackoverflow.com/questions/53834286/add-column-value-in-spark-dataset-on-the-basis-of-the-condition >> > >> >> >> >> Regards, >> Devender >> >> ________________________________ >> >> >> >> >> >> >> NOTE: This message may contain information that is confidential, >> proprietary, privileged or otherwise protected by law. The message is >> intended solely for the named addressee. If received in error, please >> destroy and notify the sender. Any use of this email is prohibited when >> received in error. Impetus does not represent, warrant and/or guarantee, >> that the integrity of this communication has been maintained nor that the >> communication is free of errors, virus, interception or interference. >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >