Re: specifying fields for join()

2014-06-13 Thread SK
I used groupBy to create the keys for both RDDs. Then I did the join.

I think though it be useful if in the future Spark could allows us to
specify the fields on which to join, even when the keys are different.
Scalding allows this feature.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/specifying-fields-for-join-tp7528p7591.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: specifying fields for join()

2014-06-13 Thread Mayur Rustagi
You can resolve the columns to create keys using them.. then join. Is that
what you did?


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Thu, Jun 12, 2014 at 9:24 PM, SK  wrote:

> This issue is resolved.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/specifying-fields-for-join-tp7528p7544.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>


Re: specifying fields for join()

2014-06-12 Thread SK
This issue is resolved.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/specifying-fields-for-join-tp7528p7544.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


specifying fields for join()

2014-06-12 Thread SK
Hi,

I want to join 2 rdds on specific fields.

The first RDD is a set of tuples of the form: (ID, ACTION, TIMESTAMP,
LOCATION) 
The second RDD is a set of tuples of the form: (ID, TIMESTAMP).

rdd2 is a subset of rdd1. ID is a string. I want to join the two so that  I
can get the location corresponding to the timestamp values in rdd2. The join
has to be on the (ID, TIMESTAMP) fields. 
I tried  rdd1.join(rdd2), but got a compilation error.  

It appears that in Spark, the join function does not take the joining fields
as arguments and joins only on the keys.
What is the right way to do the above join?

Thanks for your help.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/specifying-fields-for-join-tp7528.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.