Re: How to explode array columns of a dataframe having the same length
sql: select inline(arrays_zip(col1, col2, col3)) as (c1, c2, c3) from t1 Replied Message | From | Enrico Minack | | Date | 02/16/2023 16:06 | | To | , sam smith | | Subject | Re: How to explode array columns of a dataframe having the same length | You have to take each row and zip the lists, each element of the result becomes one new row. So turn write a method that turns Row(List("A","B","null"), List("C","D","null"), List("E","null","null")) into List(List("A","C","E"), List("B","D","null"), List("null","null","null")) and use flatmap with that method. In Scala, this would read: df.flatMap { row => (row.getSeq[String](0), row.getSeq[String](1), row.getSeq[String](2)).zipped.toIterable }.show() Enrico Am 14.02.23 um 22:54 schrieb sam smith: Hello guys, I have the following dataframe: | | col1 | col2 | col3 | | ["A","B","null"] | ["C","D","null"] | ["E","null","null"] | | | | I want to explode it to the following dataframe: | col1 | col2 | col3 | | "A" | "C" | "E" | | "B" | "D" | "null" | | "null" | "null" | "null" | How to do that (preferably in Java) using the explode() method ? knowing that something like the following won't yield correct output: for (String colName: dataset.columns()) dataset=dataset.withColumn(colName,explode(dataset.col(colName)));
Re: How to explode array columns of a dataframe having the same length
I think these 4 steps should help: Use zip Explode Withcolumn (getelement of array) Drop the array column Thanks On Thu, Feb 16, 2023, 2:18 PM sam smith wrote: > @Enrico Minack I used arrays_zip to merge values > into one row, and then used toJSON() to export the data. > @Bjørn explode_outer didn't yield the expected results. > > Thanks anyway. > > Le jeu. 16 févr. 2023 à 09:06, Enrico Minack a > écrit : > >> You have to take each row and zip the lists, each element of the result >> becomes one new row. >> >> So turn write a method that turns >> Row(List("A","B","null"), List("C","D","null"), List("E","null","null")) >> into >> List(List("A","C","E"), List("B","D","null"), >> List("null","null","null")) >> and use flatmap with that method. >> >> In Scala, this would read: >> >> df.flatMap { row => (row.getSeq[String](0), row.getSeq[String](1), >> row.getSeq[String](2)).zipped.toIterable }.show() >> >> Enrico >> >> >> Am 14.02.23 um 22:54 schrieb sam smith: >> >> Hello guys, >> >> I have the following dataframe: >> >> *col1* >> >> *col2* >> >> *col3* >> >> ["A","B","null"] >> >> ["C","D","null"] >> >> ["E","null","null"] >> >> >> I want to explode it to the following dataframe: >> >> *col1* >> >> *col2* >> >> *col3* >> >> "A" >> >> "C" >> >> "E" >> >> "B" >> >> "D" >> >> "null" >> >> "null" >> >> "null" >> >> "null" >> >> How to do that (preferably in Java) using the explode() method ? knowing >> that something like the following won't yield correct output: >> >> for (String colName: dataset.columns()) >> dataset=dataset.withColumn(colName,explode(dataset.col(colName))); >> >> >> >>
Re: How to explode array columns of a dataframe having the same length
@Enrico Minack I used arrays_zip to merge values into one row, and then used toJSON() to export the data. @Bjørn explode_outer didn't yield the expected results. Thanks anyway. Le jeu. 16 févr. 2023 à 09:06, Enrico Minack a écrit : > You have to take each row and zip the lists, each element of the result > becomes one new row. > > So turn write a method that turns > Row(List("A","B","null"), List("C","D","null"), List("E","null","null")) > into > List(List("A","C","E"), List("B","D","null"), List("null","null","null")) > and use flatmap with that method. > > In Scala, this would read: > > df.flatMap { row => (row.getSeq[String](0), row.getSeq[String](1), > row.getSeq[String](2)).zipped.toIterable }.show() > > Enrico > > > Am 14.02.23 um 22:54 schrieb sam smith: > > Hello guys, > > I have the following dataframe: > > *col1* > > *col2* > > *col3* > > ["A","B","null"] > > ["C","D","null"] > > ["E","null","null"] > > > I want to explode it to the following dataframe: > > *col1* > > *col2* > > *col3* > > "A" > > "C" > > "E" > > "B" > > "D" > > "null" > > "null" > > "null" > > "null" > > How to do that (preferably in Java) using the explode() method ? knowing > that something like the following won't yield correct output: > > for (String colName: dataset.columns()) > dataset=dataset.withColumn(colName,explode(dataset.col(colName))); > > > >
Re: How to explode array columns of a dataframe having the same length
Use explode_outer() when rows have null values. tor. 16. feb. 2023 kl. 16:48 skrev Navneet : > I am not expert, may be try if this works: > In order to achieve the desired output using the explode() method in > Java, you can create a User-Defined Function (UDF) that zips the lists > in each row and returns the resulting list. Here's an example > implementation: > > typescript > Copy code > import org.apache.spark.sql.Row; > import org.apache.spark.sql.api.java.UDF1; > import org.apache.spark.sql.types.DataTypes; > > public class ZipRows implements UDF1 { > @Override > public Row call(Row row) { > List list1 = row.getList(0); > List list2 = row.getList(1); > List list3 = row.getList(2); > List> zipped = new ArrayList<>(); > for (int i = 0; i < list1.size(); i++) { > List sublist = new ArrayList<>(); > sublist.add(list1.get(i)); > sublist.add(list2.get(i)); > sublist.add(list3.get(i)); > zipped.add(sublist); > } > return RowFactory.create(zipped); > } > } > This UDF takes a Row as input, which contains the three lists in each > row of the original DataFrame. It then zips these lists using a loop > that creates a new sublist for each element in the lists. Finally, it > returns a new Row that contains the zipped list. > > You can then use this UDF in combination with explode() to achieve the > desired output: > > javascript > Copy code > import org.apache.spark.sql.Dataset; > import org.apache.spark.sql.Row; > import static org.apache.spark.sql.functions.*; > > // assuming you have a Dataset called "df" > df.withColumn("zipped", callUDF(new ZipRows(), > > DataTypes.createArrayType(DataTypes.createArrayType(DataTypes.StringType))), > "col1", "col2", "col3") > .selectExpr("explode(zipped) as zipped") > .selectExpr("zipped[0] as col1", "zipped[1] as col2", "zipped[2] as col3") > .show(); > This code first adds a new column called "zipped" to the DataFrame > using the callUDF() function, which applies the ZipRows UDF to the > "col1", "col2", and "col3" columns. It then uses explode() to explode > the "zipped" column, and finally selects the three sub-elements of the > zipped list as separate columns using selectExpr(). The output should > be the desired DataFrame. > > > > Regards, > Navneet Kr > > > On Thu, 16 Feb 2023 at 00:07, Enrico Minack > wrote: > > > > You have to take each row and zip the lists, each element of the result > becomes one new row. > > > > So turn write a method that turns > > Row(List("A","B","null"), List("C","D","null"), > List("E","null","null")) > > into > > List(List("A","C","E"), List("B","D","null"), > List("null","null","null")) > > and use flatmap with that method. > > > > In Scala, this would read: > > > > df.flatMap { row => (row.getSeq[String](0), row.getSeq[String](1), > row.getSeq[String](2)).zipped.toIterable }.show() > > > > Enrico > > > > > > Am 14.02.23 um 22:54 schrieb sam smith: > > > > Hello guys, > > > > I have the following dataframe: > > > > col1 > > > > col2 > > > > col3 > > > > ["A","B","null"] > > > > ["C","D","null"] > > > > ["E","null","null"] > > > > > > > > I want to explode it to the following dataframe: > > > > col1 > > > > col2 > > > > col3 > > > > "A" > > > > "C" > > > > "E" > > > > "B" > > > > "D" > > > > "null" > > > > "null" > > > > "null" > > > > "null" > > > > > > How to do that (preferably in Java) using the explode() method ? knowing > that something like the following won't yield correct output: > > > > for (String colName: dataset.columns()) > > dataset=dataset.withColumn(colName,explode(dataset.col(colName))); > > > > > > > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297
Re: How to explode array columns of a dataframe having the same length
I am not expert, may be try if this works: In order to achieve the desired output using the explode() method in Java, you can create a User-Defined Function (UDF) that zips the lists in each row and returns the resulting list. Here's an example implementation: typescript Copy code import org.apache.spark.sql.Row; import org.apache.spark.sql.api.java.UDF1; import org.apache.spark.sql.types.DataTypes; public class ZipRows implements UDF1 { @Override public Row call(Row row) { List list1 = row.getList(0); List list2 = row.getList(1); List list3 = row.getList(2); List> zipped = new ArrayList<>(); for (int i = 0; i < list1.size(); i++) { List sublist = new ArrayList<>(); sublist.add(list1.get(i)); sublist.add(list2.get(i)); sublist.add(list3.get(i)); zipped.add(sublist); } return RowFactory.create(zipped); } } This UDF takes a Row as input, which contains the three lists in each row of the original DataFrame. It then zips these lists using a loop that creates a new sublist for each element in the lists. Finally, it returns a new Row that contains the zipped list. You can then use this UDF in combination with explode() to achieve the desired output: javascript Copy code import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import static org.apache.spark.sql.functions.*; // assuming you have a Dataset called "df" df.withColumn("zipped", callUDF(new ZipRows(), DataTypes.createArrayType(DataTypes.createArrayType(DataTypes.StringType))), "col1", "col2", "col3") .selectExpr("explode(zipped) as zipped") .selectExpr("zipped[0] as col1", "zipped[1] as col2", "zipped[2] as col3") .show(); This code first adds a new column called "zipped" to the DataFrame using the callUDF() function, which applies the ZipRows UDF to the "col1", "col2", and "col3" columns. It then uses explode() to explode the "zipped" column, and finally selects the three sub-elements of the zipped list as separate columns using selectExpr(). The output should be the desired DataFrame. Regards, Navneet Kr On Thu, 16 Feb 2023 at 00:07, Enrico Minack wrote: > > You have to take each row and zip the lists, each element of the result > becomes one new row. > > So turn write a method that turns > Row(List("A","B","null"), List("C","D","null"), List("E","null","null")) > into > List(List("A","C","E"), List("B","D","null"), List("null","null","null")) > and use flatmap with that method. > > In Scala, this would read: > > df.flatMap { row => (row.getSeq[String](0), row.getSeq[String](1), > row.getSeq[String](2)).zipped.toIterable }.show() > > Enrico > > > Am 14.02.23 um 22:54 schrieb sam smith: > > Hello guys, > > I have the following dataframe: > > col1 > > col2 > > col3 > > ["A","B","null"] > > ["C","D","null"] > > ["E","null","null"] > > > > I want to explode it to the following dataframe: > > col1 > > col2 > > col3 > > "A" > > "C" > > "E" > > "B" > > "D" > > "null" > > "null" > > "null" > > "null" > > > How to do that (preferably in Java) using the explode() method ? knowing that > something like the following won't yield correct output: > > for (String colName: dataset.columns()) > dataset=dataset.withColumn(colName,explode(dataset.col(colName))); > > > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: How to explode array columns of a dataframe having the same length
You have to take each row and zip the lists, each element of the result becomes one new row. So turn write a method that turns Row(List("A","B","null"), List("C","D","null"), List("E","null","null")) into List(List("A","C","E"), List("B","D","null"), List("null","null","null")) and use flatmap with that method. In Scala, this would read: df.flatMap { row => (row.getSeq[String](0), row.getSeq[String](1), row.getSeq[String](2)).zipped.toIterable }.show() Enrico Am 14.02.23 um 22:54 schrieb sam smith: Hello guys, I have the following dataframe: *col1* *col2* *col3* ["A","B","null"] ["C","D","null"] ["E","null","null"] I want to explode it to the following dataframe: *col1* *col2* *col3* "A" "C" "E" "B" "D" "null" "null" "null" "null" How to do that (preferably in Java) using the explode() method ? knowing that something like the following won't yield correct output: for (String colName: dataset.columns()) dataset=dataset.withColumn(colName,explode(dataset.col(colName)));