Hi

Sorry for this scala/spark newbie question. I am creating RDD which
represent large time series this way:
val data = sc.textFile("somefile.csv")

case class Event(
    time:       Double,
    x:          Double,
    vztot:      Double
)

val events = data.filter(s => !s.startsWith("GMT")).map{s =>
    val r = s.split(";")
...
    Event(time, x, vztot )
}

I would like to process those RDD in order to reduce them by some
filtering. For this I noticed that sliding could help but I was not able to
use it so far. Here is what I did:

import org.apache.spark.mllib.rdd.RDDFunctions._

val eventsfiltered = events.sliding(3).map(Seq(e0, e1, e2)  =>
Event(e0.time, (e0.x+e1.x+e2.x)/3.0, (e0.vztot+e1.vztot+e2.vztot)/3.0))

Thanks for your help


-- 
PGP KeyID: 2048R/EA31CFC9  subkeys.pgp.net

Reply via email to