Re: Can this be done in map-reduce technique (in parallel)

2014-06-05 Thread lmk
Hi Cheng, Sorry Again. In this method, i see that the values for a - positions.iterator b - positions.iterator always remain the same. I tried to do a b - positions.iterator.next, it throws an error: value filter is not a member of (Double, Double) Is there something I

Re: Can this be done in map-reduce technique (in parallel)

2014-06-05 Thread Christopher Nguyen
Lakshmi, this is orthogonal to your question, but in case it's useful. It sounds like you're trying to determine the home location of a user, or something similar. If that's the problem statement, the data pattern may suggest a far more computationally efficient approach. For example, first map

Re: Can this be done in map-reduce technique (in parallel)

2014-06-05 Thread lmk
Hi Cheng, Thanks a lot. That solved my problem. Thanks again for the quick response and solution. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-this-be-handled-in-map-reduce-using-RDDs-tp6905p7047.html Sent from the Apache Spark User List mailing

Can this be done in map-reduce technique (in parallel)

2014-06-04 Thread lmk
Hi, I am a new spark user. Pls let me know how to handle the following scenario: I have a data set with the following fields: 1. DeviceId 2. latitude 3. longitude 4. ip address 5. Datetime 6. Mobile application name With the above data, I would like to perform the following steps: 1. Collect all

Re: Can this be done in map-reduce technique (in parallel)

2014-06-04 Thread Oleg Proudnikov
It is possible if you use a cartesian product to produce all possible pairs for each IP address and 2 stages of map-reduce: - first by pairs of points to find the total of each pair and - second by IP address to find the pair for each IP address with the maximum count. Oleg On 4 June 2014

Re: Can this be done in map-reduce technique (in parallel)

2014-06-04 Thread Andrew Ash
When you group by IP address in step 1 to this: (ip1,(lat1,lon1),(lat2,lon2)) (ip2,(lat3,lon3),(lat4,lat5)) How many lat/lon locations do you expect for each IP address? avg and max are interesting. Andrew On Wed, Jun 4, 2014 at 5:29 AM, Oleg Proudnikov

Re: Can this be done in map-reduce technique (in parallel)

2014-06-04 Thread lmk
Hi Oleg/Andrew, Thanks much for the prompt response. We expect thousands of lat/lon pairs for each IP address. And that is my concern with the Cartesian product approach. Currently for a small sample of this data (5000 rows) I am grouping by IP address and then computing the distance between