On Sun, Mar 22, 2015 at 8:43 AM, deenar.toraskar <deenar.toras...@db.com> wrote:
> 1) if there are no sliding window calls in this streaming context, will
> there just one file written per interval?

As many files as there are partitions will be written in each interval.

> 2) if there is a sliding window call in the same context, such as
>
>     val hashTags = stream.flatMap(json =>
> DataObjectFactory.createStatus(json).getText.split("
> ").filter(_.startsWith("#")))
>
>     val topCounts60 = hashTags.map((_, 1)).reduceByKeyAndWindow(_ + _,
> Seconds(600))
>                      .map{case (topic, count) => (count, topic)}
>                      .transform(_.sortByKey(false))
>
> will the some files get written multiples time (as long as the interval is
> in the batch)

I don't think it's right to say files will be written many times, but
yes it is my understanding that data will be written many times since
a datum lies in many windows.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to