Re: How to explode array columns of a dataframe having the same length

2023-02-16 Thread Navneet
I am not expert, may be try if this works:
In order to achieve the desired output using the explode() method in
Java, you can create a User-Defined Function (UDF) that zips the lists
in each row and returns the resulting list. Here's an example
implementation:

typescript
Copy code
import org.apache.spark.sql.Row;
import org.apache.spark.sql.api.java.UDF1;
import org.apache.spark.sql.types.DataTypes;

public class ZipRows implements UDF1 {
@Override
public Row call(Row row) {
List list1 = row.getList(0);
List list2 = row.getList(1);
List list3 = row.getList(2);
List> zipped = new ArrayList<>();
for (int i = 0; i < list1.size(); i++) {
List sublist = new ArrayList<>();
sublist.add(list1.get(i));
sublist.add(list2.get(i));
sublist.add(list3.get(i));
zipped.add(sublist);
}
return RowFactory.create(zipped);
}
}
This UDF takes a Row as input, which contains the three lists in each
row of the original DataFrame. It then zips these lists using a loop
that creates a new sublist for each element in the lists. Finally, it
returns a new Row that contains the zipped list.

You can then use this UDF in combination with explode() to achieve the
desired output:

javascript
Copy code
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import static org.apache.spark.sql.functions.*;

// assuming you have a Dataset called "df"
df.withColumn("zipped", callUDF(new ZipRows(),
DataTypes.createArrayType(DataTypes.createArrayType(DataTypes.StringType))),
"col1", "col2", "col3")
.selectExpr("explode(zipped) as zipped")
.selectExpr("zipped[0] as col1", "zipped[1] as col2", "zipped[2] as col3")
.show();
This code first adds a new column called "zipped" to the DataFrame
using the callUDF() function, which applies the ZipRows UDF to the
"col1", "col2", and "col3" columns. It then uses explode() to explode
the "zipped" column, and finally selects the three sub-elements of the
zipped list as separate columns using selectExpr(). The output should
be the desired DataFrame.



Regards,
Navneet Kr


On Thu, 16 Feb 2023 at 00:07, Enrico Minack  wrote:
>
> You have to take each row and zip the lists, each element of the result 
> becomes one new row.
>
> So turn write a method that turns
>   Row(List("A","B","null"), List("C","D","null"), List("E","null","null"))
> into
>   List(List("A","C","E"), List("B","D","null"), List("null","null","null"))
> and use flatmap with that method.
>
> In Scala, this would read:
>
> df.flatMap { row => (row.getSeq[String](0), row.getSeq[String](1), 
> row.getSeq[String](2)).zipped.toIterable }.show()
>
> Enrico
>
>
> Am 14.02.23 um 22:54 schrieb sam smith:
>
> Hello guys,
>
> I have the following dataframe:
>
> col1
>
> col2
>
> col3
>
> ["A","B","null"]
>
> ["C","D","null"]
>
> ["E","null","null"]
>
>
>
> I want to explode it to the following dataframe:
>
> col1
>
> col2
>
> col3
>
> "A"
>
> "C"
>
> "E"
>
> "B"
>
> "D"
>
> "null"
>
> "null"
>
> "null"
>
> "null"
>
>
> How to do that (preferably in Java) using the explode() method ? knowing that 
> something like the following won't yield correct output:
>
> for (String colName: dataset.columns())
> dataset=dataset.withColumn(colName,explode(dataset.col(colName)));
>
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Spark job taking 10s to allocate executors and memory before submitting job

2017-09-27 Thread navneet sharma
Hi,

I am running spark job taking total 18s, in that 8 seconds for actual
processing logic(business logic) and 10s for allocating executors and
memory. How to reduce initial time.

Any ideas how to reduce time before spark job goes to submit state.

thanks,

Navneet Sharma