RE: Best way to process lookup ETL with Dataframes

2017-01-04 Thread Sesterhenn, Mike
ght? Thanks, -Mike From: Nicholas Hakobian [mailto:nicholas.hakob...@rallyhealth.com] Sent: Friday, December 30, 2016 5:50 PM To: Sesterhenn, Mike Cc: ayan guha; user@spark.apache.org Subject: Re: Best way to process lookup ETL with Dataframes Yep, sequential joins is what I have done in the p

Re: Best way to process lookup ETL with Dataframes

2016-12-30 Thread Sesterhenn, Mike
data will result. Any other thoughts? From: Nicholas Hakobian <nicholas.hakob...@rallyhealth.com> Sent: Friday, December 30, 2016 2:12:40 PM To: Sesterhenn, Mike Cc: ayan guha; user@spark.apache.org Subject: Re: Best way to process lookup ETL with Data

Re: Best way to process lookup ETL with Dataframes

2016-12-30 Thread Sesterhenn, Mike
need is to join after the first join fails. From: ayan guha <guha.a...@gmail.com> Sent: Thursday, December 29, 2016 11:06 PM To: Sesterhenn, Mike Cc: user@spark.apache.org Subject: Re: Best way to process lookup ETL with Dataframes How about this -

Best way to process lookup ETL with Dataframes

2016-12-29 Thread Sesterhenn, Mike
Hi all, I'm writing an ETL process with Spark 1.5, and I was wondering the best way to do something. A lot of the fields I am processing require an algorithm similar to this: Join input dataframe to a lookup table. if (that lookup fails (the joined fields are null)) { Lookup into some