Hi everyone, I'm starting in Spark Streaming and would like to know somethings about data arriving.
I know that SS uses micro-batches and they are received by workers and sent to RDD. The master, on defined intervals, receives a poiter to micro-batch in RDD and can use it to process data using mappers and reducers. 1. But, before the master be called, can I work on data? can I do something for each object that arrives on workers when it arrives? 2. The data stream normally is denoted by an ordered sequence of data. But when it arrives in micro-baches, I receive a lot of objects at the same time. How can I determine which order of objects inside batch? Can I extract the timestamp or ordered ID of the arrive for each object? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/The-coming-data-on-Spark-Streaming-tp27720.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org