How to Fill Sparse Data With the Previous Non-Empty Value in SPARQL Dataset

Carlo Allocca Wed, 28 Jun 2017 06:27:28 -0700

Dear All,

I am trying to propagate the last valid observation (e.g. not null) to the null 
values in a dataset.


Below I reported the partial solution:

Dataset<Row> tmp800=tmp700.select("uuid", "eventTime", "Washer_rinseCycles");
            WindowSpec wspec= 
Window.partitionBy(tmp800.col("uuid")).orderBy(tmp800.col("uuid"),tmp800.col("eventTime"));
            Column c1 = 
org.apache.spark.sql.functions.lag(tmp800.col("Washer_rinseCycles"),1).over(wspec);
            Dataset<Row> tmp900=tmp800.withColumn("Washer_rinseCyclesFilled", 
when(tmp800.col("Washer_rinseCycles").isNull(),                             
c1).otherwise(tmp800.col("Washer_rinseCycles")));
However, It does not solve the entire problem as the function lag(,1) returns 
the value that is the rows before the current row even if it is NULL (see the 
below table).

Is there in SPARK a similar method to Pandas' "backfill" for the DataFrame?

Is it possible to do it using SPARK API? How?

Many Thanks in advance.
Best Regards,
Carlo

[Immagine in linea con il testo]

How to Fill Sparse Data With the Previous Non-Empty Value in SPARQL Dataset

Reply via email to