Hi everyone,

I'm starting in Spark Streaming and would like to know somethings about data
arriving.

I know that SS uses micro-batches and they are received by workers and sent
to RDD. The master, on defined intervals, receives a poiter to micro-batch
in RDD and can use it to process data using mappers and reducers.

1. But, before the master be called, can I work on data? can I do something
for each object that arrives on workers when it arrives?

2. The data stream normally is denoted by an ordered sequence of data. But
when it arrives in micro-baches, I receive a lot of objects at the same
time. How can I determine which order of objects inside batch? Can I extract
the timestamp or ordered ID of the arrive for each object?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/The-coming-data-on-Spark-Streaming-tp27720.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to