subject:"Spark Streaming Topology"

Spark Streaming Topology

2015-09-14 Thread defstat

Hi all, 

I would like to use Spark Streaming for managing the problem below: 

I have 2 InputStreams, one for one type of input (n-dimensional vectors) and
one for question on the infrastructure (explained below). 

I need to "break" the input first in 4 execution nodes, and produce a stream
from each (4 streams) containing only m vectors that share common attributes
(what is "common" comes from comparing incoming vectors with vectors already
inside the "selected" group). If the selected group of m vectors changes by
the addition of an new vector, I need to forward the selected group to an
"aggregation node" that joins 2 execution nodes' selected groups (we have 2
such aggregation nodes). Following that, if an aggregation node has a change
in its joined selected group of vectors, it forwards its joined selected
vectors to an other aggregate node that contains the overall aggregation of
the 2 aggregation nodes (JoinAll). 

Now, if a "question" arrives, a mapToPair and then Sort transformations need
to be done on JoinAll, and print the result (one result for each question) 

Can anyone help me on this endeavour? 

I think, and correct me if I am mistaken, the architecture described needs
to: 

1. Partition Input Stream(1) to through  DStream eachPartition =
inputVectors.repartition(4) 
2. Filter() eachPartition like DStream eachPartitionFiltered =
eachPartition.filter(func) -> How can I use already persisted to node data
for that [I mean the already "selected group" of that specific partition] 
3. Group 2 out of 4 partitions to another DStream if needed and store it
inside another node. [???] 
4. if a "question" arrives, use the JoinAll DStream for answering the
question. [???] 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Topology-tp24686.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark Streaming Topology

2015-09-14 Thread defstat

Hi all, 

I would like to use Spark Streaming for managing the problem below:

I have 2 InputStreams, one for one type of input (n-dimensional vectors) and
one for question on the infrastructure (explained below).

I need to "break" the input first in 4 execution nodes, and produce a stream
from each (4 streams) containing only m vectors that share common attributes
(what is "common" comes from comparing incoming vectors with vectors already
inside the "selected" group). If the selected group of m vectors changes by
the addition of an new vector, I need to forward the selected group to an
"aggregation node" that joins 2 execution nodes' selected groups (we have 2
such aggregation nodes). Following that, if an aggregation node has a change
in its joined selected group of vectors, it forwards its joined selected
vectors to an other aggregate node that contains the overall aggregation of
the 2 aggregation nodes (JoinAll). 

Now, if a "question" arrives, a mapToPair and then Sort transformations need
to be done on JoinAll, and print the result (one result for each question)

Can anyone help me on this endeavour? 

I think, and correct me if I am mistaken, the architecture described needs
to:

1. Partition Input Stream(1) to through  DStream eachPartition =
inputVectors.repartition(4)
2. Filter() eachPartition like DStream eachPartitionFiltered =
eachPartition.filter(func) -> How can I use already persisted to node data
for that [I mean the already "selected group" of that specific partition]
3. Group 2 out of 4 partitions to another DStream if needed and store it
inside another node. [???]
4. if a "question" arrives, use the JoinAll DStream for answering the
question. [???] 

Thank you in advance.
 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Topology-tp24685.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark Streaming Topology

Spark Streaming Topology

2 matches

Site Navigation

Mail list logo

Footer information