Re: specifying fields for join()
I used groupBy to create the keys for both RDDs. Then I did the join. I think though it be useful if in the future Spark could allows us to specify the fields on which to join, even when the keys are different. Scalding allows this feature. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/specifying-fields-for-join-tp7528p7591.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: specifying fields for join()
You can resolve the columns to create keys using them.. then join. Is that what you did? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Thu, Jun 12, 2014 at 9:24 PM, SK wrote: > This issue is resolved. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/specifying-fields-for-join-tp7528p7544.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >
Re: specifying fields for join()
This issue is resolved. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/specifying-fields-for-join-tp7528p7544.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
specifying fields for join()
Hi, I want to join 2 rdds on specific fields. The first RDD is a set of tuples of the form: (ID, ACTION, TIMESTAMP, LOCATION) The second RDD is a set of tuples of the form: (ID, TIMESTAMP). rdd2 is a subset of rdd1. ID is a string. I want to join the two so that I can get the location corresponding to the timestamp values in rdd2. The join has to be on the (ID, TIMESTAMP) fields. I tried rdd1.join(rdd2), but got a compilation error. It appears that in Spark, the join function does not take the joining fields as arguments and joins only on the keys. What is the right way to do the above join? Thanks for your help. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/specifying-fields-for-join-tp7528.html Sent from the Apache Spark User List mailing list archive at Nabble.com.