On Sun, Mar 22, 2015 at 8:43 AM, deenar.toraskar <deenar.toras...@db.com> wrote: > 1) if there are no sliding window calls in this streaming context, will > there just one file written per interval?
As many files as there are partitions will be written in each interval. > 2) if there is a sliding window call in the same context, such as > > val hashTags = stream.flatMap(json => > DataObjectFactory.createStatus(json).getText.split(" > ").filter(_.startsWith("#"))) > > val topCounts60 = hashTags.map((_, 1)).reduceByKeyAndWindow(_ + _, > Seconds(600)) > .map{case (topic, count) => (count, topic)} > .transform(_.sortByKey(false)) > > will the some files get written multiples time (as long as the interval is > in the batch) I don't think it's right to say files will be written many times, but yes it is my understanding that data will be written many times since a datum lies in many windows. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org