Hello Experts,

I have a requirement of maintaining a list of ids for every customer for
all of time. I should be able to provide count distinct ids on demand. All
the examples I have seen so far indicate I need to maintain counts
directly. My concern is, I will not be able to identify cumulative distinct
values in that case. Also, maintaining a state so huge would be tolerable
to the framework?

Here is what I am attempting:

val updateUniqueids :Option[RDD[String]] = (values: Seq[String],
state: Option[RDD[String]]) => {
  val currentLst = values.distinct
  val previousLst:RDD[String] = state.getOrElse().asInstanceOf[RDD[String]]
  Some(currentLst.union(previousLst).distinct)
}

Another challenge is concatenating RDD[String] and Seq[String] without
being able to access the spark context as this function has to adhere to

updateFunc: (Seq[V], Option[S]) => Option[S]

I'm also trying to figure out if I can use the
(iterator: Iterator[(K, Seq[V], Option[S])]) but haven't figured it out yet.

Appreciate any suggestions in this regard.

regards

Sunita

P.S:
I am aware of mapwithState but not on the latest version as of now.

Reply via email to