You cannot have nested RDD transformations in Scala Spark. The issue is that
when the outer operation is distributed to the cluster and kicks off a new job
(the inner query) the inner job no longer has the context for the outer job.
The way around this is to either do a join on two RDDs or to
Ok cool. So in that case the only way I could think of doing this would be
calling the toArray method on those RDDs which would return Array[String]
and store them as broadcast variables. I read about the broadcast
variables, but it still fuzzy. I am assume that since broadcast variables
are