Hi All,

I have an RDD of case class objects.

scala> case class Entity(
     |     value: String,
     |     identifier: String
     | )
defined class Entity

scala> Entity("hello", "id1")
res25: Entity = Entity(hello,id1)

During a map operation, I'd like to return a new RDD that contains all of
the data of the original RDD with the addition of new data that was looked
up based on the identifiers provided.

The lookup table table in Cassandra looks something like...

id    |   type
-----+-------------
id1 |  action
id2 |  view

The end result would be an RDD of EntityExtended

case class EntityExtended(
    value: String,
    identifier: String
    type: String
)

I believe that it would make sense to use a broadcast variable. However,
I'm not sure what the best way would be to incorporate it during a map
operation.

rdd.map(MyObject.extendEntity)

object MyObject {
   def extendEntity(entity: Entity): EntityExtended = {
       val id = entity.identifier

       // lookup identifier in broadcast variable?
   }
}

Thanks, Mike.

Reply via email to