Re: How to union RDD and remove duplicated keys

2015-02-13 Thread Boromir Widas
reducebyKey should work, but you need to define the ordering by using some sort of index. On Fri, Feb 13, 2015 at 12:38 PM, Wang, Ningjun (LNG-NPV) ningjun.w...@lexisnexis.com wrote: I have multiple RDD[(String, String)] that store (docId, docText) pairs, e.g. rdd1: (“id1”, “Long text

RE: How to union RDD and remove duplicated keys

2015-02-13 Thread Wang, Ningjun (LNG-NPV)
is appreciated because I am new to Spark. Ningjun From: Boromir Widas [mailto:vcsub...@gmail.com] Sent: Friday, February 13, 2015 1:28 PM To: Wang, Ningjun (LNG-NPV) Cc: user@spark.apache.org Subject: Re: How to union RDD and remove duplicated keys reducebyKey should work, but you need to define the ordering

Re: How to union RDD and remove duplicated keys

2015-02-13 Thread Boromir Widas
because I am new to Spark. Ningjun *From:* Boromir Widas [mailto:vcsub...@gmail.com] *Sent:* Friday, February 13, 2015 1:28 PM *To:* Wang, Ningjun (LNG-NPV) *Cc:* user@spark.apache.org *Subject:* Re: How to union RDD and remove duplicated keys reducebyKey should work, but you need