subject:"How to union RDD and remove duplicated keys"

How to union RDD and remove duplicated keys

2015-02-13 Thread Wang, Ningjun (LNG-NPV)

I have multiple RDD[(String, String)] that store (docId, docText) pairs, e.g. rdd1: (id1, Long text 1), (id2, Long text 2), (id3, Long text 3) rdd2: (id1, Long text 1 A), (id2, Long text 2 A) rdd3: (id1, Long text 1 B) Then, I want to merge all RDDs. If there is duplicated docids, later

Re: How to union RDD and remove duplicated keys

2015-02-13 Thread Boromir Widas

reducebyKey should work, but you need to define the ordering by using some sort of index. On Fri, Feb 13, 2015 at 12:38 PM, Wang, Ningjun (LNG-NPV) ningjun.w...@lexisnexis.com wrote: I have multiple RDD[(String, String)] that store (docId, docText) pairs, e.g. rdd1: (“id1”, “Long text

RE: How to union RDD and remove duplicated keys

2015-02-13 Thread Wang, Ningjun (LNG-NPV)

is appreciated because I am new to Spark. Ningjun From: Boromir Widas [mailto:vcsub...@gmail.com] Sent: Friday, February 13, 2015 1:28 PM To: Wang, Ningjun (LNG-NPV) Cc: user@spark.apache.org Subject: Re: How to union RDD and remove duplicated keys reducebyKey should work, but you need to define the ordering

Re: How to union RDD and remove duplicated keys

2015-02-13 Thread Boromir Widas

because I am new to Spark. Ningjun *From:* Boromir Widas [mailto:vcsub...@gmail.com] *Sent:* Friday, February 13, 2015 1:28 PM *To:* Wang, Ningjun (LNG-NPV) *Cc:* user@spark.apache.org *Subject:* Re: How to union RDD and remove duplicated keys reducebyKey should work, but you need

How to union RDD and remove duplicated keys

Re: How to union RDD and remove duplicated keys

RE: How to union RDD and remove duplicated keys

Re: How to union RDD and remove duplicated keys

4 matches

Site Navigation

Mail list logo

Footer information