subject:"how to group timestamp data and filter on it"

RE: how to group timestamp data and filter on it

2015-11-18 Thread Tim Barthram

Hi LCassa,

Try:

Map to pair, then reduce by key.

The spark documentation is a pretty good reference for this & there are plenty 
of word count examples on the internet.

Warm regards,
TimB


From: Cassa L [mailto:lcas...@gmail.com]
Sent: Thursday, 19 November 2015 11:27 AM
To: user
Subject: how to group timestamp data and filter on it

Hi,
I have a data stream (JavaDStream) in following format-
timestamp=second1,  map(key1=value1, key2=value2)
timestamp=second2,map(key1=value3, key2=value4)
timestamp=second2, map(key1=value1, key2=value5)

I want to group data by 'timestamp' first and then filter each RDD for 
Key1=value1 or key1=value3 etc.
Each of above row represent POJO in RDD like:
public class Data{
long timestamp;
Map<String, String> map;
}
How do do this in spark? I am trying to figure out if I need to use map or 
flatMap etc?
Thanks,
LCassa


_

The information transmitted in this message and its attachments (if any) is 
intended 
only for the person or entity to which it is addressed.
The message may contain confidential and/or privileged material. Any review, 
retransmission, dissemination or other use of, or taking of any action in 
reliance 
upon this information, by persons or entities other than the intended recipient 
is 
prohibited.

If you have received this in error, please contact the sender and delete this 
e-mail 
and associated material from any computer.

The intended recipient of this e-mail may only use, reproduce, disclose or 
distribute 
the information contained in this e-mail and any attached files, with the 
permission 
of the sender.

This message has been scanned for viruses.
_

how to group timestamp data and filter on it

2015-11-18 Thread Cassa L

Hi,
I have a data stream (JavaDStream) in following format-
timestamp=second1,  map(key1=value1, key2=value2)
timestamp=second2,map(key1=value3, key2=value4)
timestamp=second2, map(key1=value1, key2=value5)


I want to group data by 'timestamp' first and then filter each RDD for
Key1=value1 or key1=value3 etc.

Each of above row represent POJO in RDD like:
public class Data{
long timestamp;
Map map;
}

How do do this in spark? I am trying to figure out if I need to use map or
flatMap etc?

Thanks,
LCassa

RE: how to group timestamp data and filter on it

how to group timestamp data and filter on it

2 matches

Site Navigation

Mail list logo

Footer information