reducebyKey should work, but you need to define the ordering by using some
sort of index.
On Fri, Feb 13, 2015 at 12:38 PM, Wang, Ningjun (LNG-NPV)
ningjun.w...@lexisnexis.com wrote:
I have multiple RDD[(String, String)] that store (docId, docText) pairs,
e.g.
rdd1: (“id1”, “Long text
is appreciated because I am new to Spark.
Ningjun
From: Boromir Widas [mailto:vcsub...@gmail.com]
Sent: Friday, February 13, 2015 1:28 PM
To: Wang, Ningjun (LNG-NPV)
Cc: user@spark.apache.org
Subject: Re: How to union RDD and remove duplicated keys
reducebyKey should work, but you need to define the ordering
because I am new to Spark.
Ningjun
*From:* Boromir Widas [mailto:vcsub...@gmail.com]
*Sent:* Friday, February 13, 2015 1:28 PM
*To:* Wang, Ningjun (LNG-NPV)
*Cc:* user@spark.apache.org
*Subject:* Re: How to union RDD and remove duplicated keys
reducebyKey should work, but you need