You cannot have nested RDD transformations in Scala Spark. The issue is that when the outer operation is distributed to the cluster and kicks off a new job (the inner query) the inner job no longer has the context for the outer job. The way around this is to either do a join on two RDDs or to store a serializable lookup structure (not an RDD) in memory and have that sent to the nodes during execution. You can even do this efficiently by defining a broadcast variable.
I apologize for not providing examples - am on my phone :) -----Original Message----- From: kpeng1 [kpe...@gmail.com<mailto:kpe...@gmail.com>] Sent: Tuesday, October 28, 2014 06:34 PM Eastern Standard Time To: u...@spark.incubator.apache.org Subject: Is it possible to call a transform + action inside an action? I currently writing an application that uses spark streaming. What I am trying to do is basically read in a few files (I do this by using the spark context textFile) and then process those files inside an action that I apply to a streaming RDD. Here is the main code below: def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName("EmailIngestion") val ssc = new StreamingContext(sparkConf, Seconds(1)) val sc = new SparkContext(sparkConf) val badWords = sc.textFile("/filters/badwords.txt") val urlBlacklist = sc.textFile("/filters/source_url_blacklist.txt") val domainBlacklist = sc.textFile("/filters/domain_blacklist.txt") val emailBlacklist = sc.textFile("/filters/blacklist.txt") val lines = FlumeUtils.createStream(ssc, "localhost", 4545, StorageLevel.MEMORY_ONLY_SER_2) lines.foreachRDD(rdd => rdd.foreachPartition(json => Processor.ProcessRecord(json, badWords, urlBlacklist, domainBlacklist, emailBlacklist))) ssc.start() ssc.awaitTermination() } Here is the code for processing the files found inside the ProcessRecord method: val emailBlacklistCnt = emailBlacklist.filter(black => black.contains(email)).count It looks like this throws an exception. Is it possible to do this? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-possible-to-call-a-transform-action-inside-an-action-tp17568.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org ________________________________________________________ The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.