Unsubscribe

2023-05-01 Thread peng





Unsubscribe

2023-05-01 Thread sandeep vura
-- 
Sandeep V


Re: Change column values using several when conditions

2023-05-01 Thread Bjørn Jørgensen
you can check if the value exists by using distinct before you loop over
the dataset.

man. 1. mai 2023 kl. 10:38 skrev marc nicole :

> Hello
>
> I want to change values of a column in a dataset according to a mapping
> list that maps original values of that column to other new values. Each
> element of the list (colMappingValues) is a string that separates the
> original values from the new values using a ";".
>
> So for a given column (in the following example colName), I do the
> following processing to alter the column values as described:
>
> for (i=0;i>
>> //below lists contains all distinct values of a column
>> (colMappingValues[i]) and their target values)
>> allValuesChanges = colMappingValues[i].toString().split(";", 2);
>>
>>  dataset  = dataset.withColumn(colName,
>> when(dataset.col(colName).equalTo(allValuesChanges[0])),allValuesChanges[1]).otherwise(dataset.col(colName));
>
> }
>
> which is working but I want it to be efficient to avoid unnecessary
> iterations. Meaning that I want when the column doesn't contain the value
> from the list, the call to withColumn() gets ignored.
> How to do exactly that in a more efficient way using Spark in Java?
>
>
>

-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297


Change column values using several when conditions

2023-05-01 Thread marc nicole
Hello

I want to change values of a column in a dataset according to a mapping
list that maps original values of that column to other new values. Each
element of the list (colMappingValues) is a string that separates the
original values from the new values using a ";".

So for a given column (in the following example colName), I do the
following processing to alter the column values as described:

for (i=0;i
> //below lists contains all distinct values of a column
> (colMappingValues[i]) and their target values)
> allValuesChanges = colMappingValues[i].toString().split(";", 2);
>
>  dataset  = dataset.withColumn(colName,
> when(dataset.col(colName).equalTo(allValuesChanges[0])),allValuesChanges[1]).otherwise(dataset.col(colName));

}

which is working but I want it to be efficient to avoid unnecessary
iterations. Meaning that I want when the column doesn't contain the value
from the list, the call to withColumn() gets ignored.
How to do exactly that in a more efficient way using Spark in Java?