Need to know the feasibility of the below task. I am thinking of this one to
be a mapreduce-spark effort.

I need to run distributed sliding Window Comparison for digital data
matching on top of Hadoop. The data(Hive Table) will be partitioned,
distributed across data node. Then the window comparison tool, multiple
instance of it, would run on the individual partitions(locally to the data
node). 

This window comparison tool will be a sliding window in which all the rows
in a window interval will be compared based on different columns to each
other and a score will be generated. 

I am more familiar with map-reduce and I think uptill the partitioning part
we can do in it. For the distributed window comparison I am thinking of
using spark. I know spark streaming has a sliding window functionality. Can
I use that to accomplish above task?

Any suggestions are appreciated.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Window-comparison-matching-using-the-sliding-window-functionality-feasibility-tp15352.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to