Hi LCassa, Try:
Map to pair, then reduce by key. The spark documentation is a pretty good reference for this & there are plenty of word count examples on the internet. Warm regards, TimB From: Cassa L [mailto:lcas...@gmail.com] Sent: Thursday, 19 November 2015 11:27 AM To: user Subject: how to group timestamp data and filter on it Hi, I have a data stream (JavaDStream) in following format- timestamp=second1, map(key1=value1, key2=value2) timestamp=second2,map(key1=value3, key2=value4) timestamp=second2, map(key1=value1, key2=value5) I want to group data by 'timestamp' first and then filter each RDD for Key1=value1 or key1=value3 etc. Each of above row represent POJO in RDD like: public class Data{ long timestamp; Map<String, String> map; } How do do this in spark? I am trying to figure out if I need to use map or flatMap etc? Thanks, LCassa _____________________________________________________________________ The information transmitted in this message and its attachments (if any) is intended only for the person or entity to which it is addressed. The message may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information, by persons or entities other than the intended recipient is prohibited. If you have received this in error, please contact the sender and delete this e-mail and associated material from any computer. The intended recipient of this e-mail may only use, reproduce, disclose or distribute the information contained in this e-mail and any attached files, with the permission of the sender. This message has been scanned for viruses. _____________________________________________________________________