Hi LCassa,
Try:
Map to pair, then reduce by key.
The spark documentation is a pretty good reference for this & there are plenty
of word count examples on the internet.
Warm regards,
TimB
From: Cassa L [mailto:lcas...@gmail.com]
Sent: Thursday, 19 November 2015 11:27 AM
To: user
Subject: how to group timestamp data and filter on it
Hi,
I have a data stream (JavaDStream) in following format-
timestamp=second1, map(key1=value1, key2=value2)
timestamp=second2,map(key1=value3, key2=value4)
timestamp=second2, map(key1=value1, key2=value5)
I want to group data by 'timestamp' first and then filter each RDD for
Key1=value1 or key1=value3 etc.
Each of above row represent POJO in RDD like:
public class Data{
long timestamp;
Map<String, String> map;
}
How do do this in spark? I am trying to figure out if I need to use map or
flatMap etc?
Thanks,
LCassa
_
The information transmitted in this message and its attachments (if any) is
intended
only for the person or entity to which it is addressed.
The message may contain confidential and/or privileged material. Any review,
retransmission, dissemination or other use of, or taking of any action in
reliance
upon this information, by persons or entities other than the intended recipient
is
prohibited.
If you have received this in error, please contact the sender and delete this
e-mail
and associated material from any computer.
The intended recipient of this e-mail may only use, reproduce, disclose or
distribute
the information contained in this e-mail and any attached files, with the
permission
of the sender.
This message has been scanned for viruses.
_