subject:"Merging two avro RDD\/DataFrames"

Re: Merging two avro RDD/DataFrames

2015-09-29 Thread Adrian Tanase

Seems to me that the obvious candidate is loading both master and delta, using 
join or cogroup then write the new master.

Through some clever sharding and key management you might achieve some 
efficiency gains, but I’d say start here if your numbers are in the hundreds of 
thousands… should run under a minute with the correct resources…

-adrian

From: TEST ONE
Date: Tuesday, September 29, 2015 at 3:00 AM
To: "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: Merging two avro RDD/DataFrames


I have a daily update of modified users (~100s) output as avro from ETL. I’d 
need to find and merge with existing corresponding members in a master avro 
file (~100,000s) The merge operation involves merging a ‘profiles’ 
Map<String,String> between the matching records.


What would be the recommended pattern to handle record merging with Spark?


Thanks,

kc

Merging two avro RDD/DataFrames

2015-09-28 Thread TEST ONE

I have a daily update of modified users (~100s) output as avro from ETL.
I’d need to find and merge with existing corresponding members in a master
avro file (~100,000s) The merge operation involves merging a ‘profiles’
Map between the matching records.


What would be the recommended pattern to handle record merging with Spark?


Thanks,

kc

Re: Merging two avro RDD/DataFrames

Merging two avro RDD/DataFrames

2 matches

Site Navigation

Mail list logo

Footer information