Hi Cheng,
Sorry Again.
In this method, i see that the values for
a - positions.iterator
b - positions.iterator
always remain the same. I tried to do a b - positions.iterator.next, it
throws an error: value filter is not a member of (Double, Double)
Is there something I
Lakshmi, this is orthogonal to your question, but in case it's useful.
It sounds like you're trying to determine the home location of a user, or
something similar.
If that's the problem statement, the data pattern may suggest a far more
computationally efficient approach. For example, first map
Hi Cheng,
Thanks a lot. That solved my problem.
Thanks again for the quick response and solution.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Can-this-be-handled-in-map-reduce-using-RDDs-tp6905p7047.html
Sent from the Apache Spark User List mailing
Hi,
I am a new spark user. Pls let me know how to handle the following scenario:
I have a data set with the following fields:
1. DeviceId
2. latitude
3. longitude
4. ip address
5. Datetime
6. Mobile application name
With the above data, I would like to perform the following steps:
1. Collect all
It is possible if you use a cartesian product to produce all possible
pairs for each IP address and 2 stages of map-reduce:
- first by pairs of points to find the total of each pair and
- second by IP address to find the pair for each IP address with the
maximum count.
Oleg
On 4 June 2014
When you group by IP address in step 1 to this:
(ip1,(lat1,lon1),(lat2,lon2))
(ip2,(lat3,lon3),(lat4,lat5))
How many lat/lon locations do you expect for each IP address? avg and max
are interesting.
Andrew
On Wed, Jun 4, 2014 at 5:29 AM, Oleg Proudnikov
Hi Oleg/Andrew,
Thanks much for the prompt response.
We expect thousands of lat/lon pairs for each IP address. And that is my
concern with the Cartesian product approach.
Currently for a small sample of this data (5000 rows) I am grouping by IP
address and then computing the distance between