RE: HOw can I merge multiple DataFrame and remove duplicated key

2015-04-30 Thread ayan guha
it using DataFrame? Can you give an example code snipet? Thanks Ningjun *From:* ayan guha [mailto:guha.a...@gmail.com] *Sent:* Wednesday, April 29, 2015 5:54 PM *To:* Wang, Ningjun (LNG-NPV) *Cc:* user@spark.apache.org *Subject:* Re: HOw can I merge multiple DataFrame and remove duplicated key

RE: HOw can I merge multiple DataFrame and remove duplicated key

2015-04-30 Thread Wang, Ningjun (LNG-NPV)
a DataFrame to RDD and then invoke the recudeByKey Ningjun From: ayan guha [mailto:guha.a...@gmail.com] Sent: Thursday, April 30, 2015 3:41 AM To: Wang, Ningjun (LNG-NPV) Cc: user@spark.apache.org Subject: RE: HOw can I merge multiple DataFrame and remove duplicated key 1. Do a group by and get

HOw can I merge multiple DataFrame and remove duplicated key

2015-04-29 Thread Wang, Ningjun (LNG-NPV)
I have multiple DataFrame objects each stored in a parquet file. The DataFrame just contains 3 columns (id, value, timeStamp). I need to union all the DataFrame objects together but for duplicated id only keep the record with the latest timestamp. How can I do that? I can do this for RDDs

Re: HOw can I merge multiple DataFrame and remove duplicated key

2015-04-29 Thread ayan guha
Its no different, you would use group by and aggregate function to do so. On 30 Apr 2015 02:15, Wang, Ningjun (LNG-NPV) ningjun.w...@lexisnexis.com wrote: I have multiple DataFrame objects each stored in a parquet file. The DataFrame just contains 3 columns (id, value, timeStamp). I need to

RE: HOw can I merge multiple DataFrame and remove duplicated key

2015-04-29 Thread Wang, Ningjun (LNG-NPV)
@spark.apache.org Subject: Re: HOw can I merge multiple DataFrame and remove duplicated key Its no different, you would use group by and aggregate function to do so. On 30 Apr 2015 02:15, Wang, Ningjun (LNG-NPV) ningjun.w...@lexisnexis.commailto:ningjun.w...@lexisnexis.com wrote: I have multiple DataFrame