Spark Streaming Topology

defstat Mon, 14 Sep 2015 04:19:46 -0700

Hi all, 

I would like to use Spark Streaming for managing the problem below:


I have 2 InputStreams, one for one type of input (n-dimensional vectors) and
one for question on the infrastructure (explained below). 

I need to "break" the input first in 4 execution nodes, and produce a stream
from each (4 streams) containing only m vectors that share common attributes
(what is "common" comes from comparing incoming vectors with vectors already
inside the "selected" group). If the selected group of m vectors changes by
the addition of an new vector, I need to forward the selected group to an
"aggregation node" that joins 2 execution nodes' selected groups (we have 2
such aggregation nodes). Following that, if an aggregation node has a change
in its joined selected group of vectors, it forwards its joined selected
vectors to an other aggregate node that contains the overall aggregation of
the 2 aggregation nodes (JoinAll). 

Now, if a "question" arrives, a mapToPair and then Sort transformations need
to be done on JoinAll, and print the result (one result for each question) 

Can anyone help me on this endeavour? 

I think, and correct me if I am mistaken, the architecture described needs
to: 

1. Partition Input Stream(1) to through  DStream eachPartition =
inputVectors.repartition(4) 
2. Filter() eachPartition like DStream eachPartitionFiltered =
eachPartition.filter(func) -> How can I use already persisted to node data
for that [I mean the already "selected group" of that specific partition] 
3. Group 2 out of 4 partitions to another DStream if needed and store it
inside another node. [???] 
4. if a "question" arrives, use the JoinAll DStream for answering the
question. [???] 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Topology-tp24686.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark Streaming Topology

Reply via email to