How to propagate Non-Empty Value in SPARQL Dataset

2017-06-28 Thread carloallocca
Dear All, 

I am trying to propagate the last valid observation (e.g. not null) to the
null values in a dataset. 

Below I reported the partial solution:

Dataset tmp800=tmp700.select("uuid", "eventTime",
"Washer_rinseCycles");
WindowSpec wspec=
Window.partitionBy(tmp800.col("uuid")).orderBy(tmp800.col("uuid"),tmp800.col("eventTime"));
Column c1 =
org.apache.spark.sql.functions.lag(tmp800.col("Washer_rinseCycles"),1).over(wspec);
Dataset
tmp900=tmp800.withColumn("Washer_rinseCyclesFilled",
when(tmp800.col("Washer_rinseCycles").isNull(),
c1).otherwise(tmp800.col("Washer_rinseCycles")));
However, It does not solve the entire problem as the function lag(,1)
returns the value that is the rows before the current row even if it is NULL
(see the below table).


Is there in SPARK a similar method to Pandas’ “backfill” for the DataFrame?

Is it possible to do it using SPARK API? How?

Many Thanks in advance. 
Best Regards,


<http://apache-spark-user-list.1001560.n3.nabble.com/file/n28802/example.png> 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-propagate-Non-Empty-Value-in-SPARQL-Dataset-tp28802.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



How to propagate Non-Empty Value in SPARQL Dataset

2017-06-28 Thread carloallocca
Dear All, 

I am trying to propagate the last valid observation (e.g. not null) to the
null values in a dataset. 

Below I reported the partial solution:

Dataset tmp800=tmp700.select("uuid", "eventTime",
"Washer_rinseCycles");
WindowSpec wspec=
Window.partitionBy(tmp800.col("uuid")).orderBy(tmp800.col("uuid"),tmp800.col("eventTime"));
Column c1 =
org.apache.spark.sql.functions.lag(tmp800.col("Washer_rinseCycles"),1).over(wspec);
Dataset
tmp900=tmp800.withColumn("Washer_rinseCyclesFilled",
when(tmp800.col("Washer_rinseCycles").isNull(),
c1).otherwise(tmp800.col("Washer_rinseCycles")));
However, It does not solve the entire problem as the function lag(,1)
returns the value that is the rows before the current row even if it is NULL
(see the below table).


Is there in SPARK a similar method to Pandas’ “backfill” for the DataFrame?

Is it possible to do it using SPARK API? How?

Many Thanks in advance. 
Best Regards,


<http://apache-spark-user-list.1001560.n3.nabble.com/file/n28803/example.png> 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-propagate-Non-Empty-Value-in-SPARQL-Dataset-tp28803.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org