Need to know the feasibility of the below task. I am thinking of this one to be a mapreduce-spark effort.
I need to run distributed sliding Window Comparison for digital data matching on top of Hadoop. The data(Hive Table) will be partitioned, distributed across data node. Then the window comparison tool, multiple instance of it, would run on the individual partitions(locally to the data node). This window comparison tool will be a sliding window in which all the rows in a window interval will be compared based on different columns to each other and a score will be generated. I am more familiar with map-reduce and I think uptill the partitioning part we can do in it. For the distributed window comparison I am thinking of using spark. I know spark streaming has a sliding window functionality. Can I use that to accomplish above task? Any suggestions are appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Window-comparison-matching-using-the-sliding-window-functionality-feasibility-tp15352.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org