Thanks, but is nvl() in Spark 1.5? I can't find it in spark.sql.functions (http://spark.apache.org/docs/1.5.0/api/scala/index.html#org.apache.spark.sql.functions$)
Reading about the Oracle nvl function, it seems it is similar to the na functions. Not sure it will help though, because what I need is to join after the first join fails. ________________________________ From: ayan guha <guha.a...@gmail.com> Sent: Thursday, December 29, 2016 11:06 PM To: Sesterhenn, Mike Cc: user@spark.apache.org Subject: Re: Best way to process lookup ETL with Dataframes How about this - select a.*, nvl(b.col,nvl(c.col,'some default')) from driving_table a left outer join lookup1 b on a.id<http://a.id>=b.id<http://b.id> left outer join lookup2 c on a.id<http://a.id>=c,id ? On Fri, Dec 30, 2016 at 9:55 AM, Sesterhenn, Mike <msesterh...@cars.com<mailto:msesterh...@cars.com>> wrote: Hi all, I'm writing an ETL process with Spark 1.5, and I was wondering the best way to do something. A lot of the fields I am processing require an algorithm similar to this: Join input dataframe to a lookup table. if (that lookup fails (the joined fields are null)) { Lookup into some other table to join some other fields. } With Dataframes, it seems the only way to do this is to do something like this: Join input dataframe to a lookup table. if (that lookup fails (the joined fields are null)) { *SPLIT the dataframe into two DFs via DataFrame.filter(), one group with successful lookup, the other failed).* For failed lookup: { Lookup into some other table to grab some other fields. } *MERGE the dataframe splits back together via DataFrame.unionAll().* } I'm seeing some really large execution plans as you might imagine in the Spark Ui, and the processing time seems way out of proportion with the size of the dataset. (~250GB in 9 hours). Is this the best approach to implement an algorithm like this? Note also that some fields I am implementing require multiple staged split/merge steps due to cascading lookup joins. Thanks, Michael Sesterhenn msesterh...@cars.com<mailto:msesterh...@cars.com> -- Best Regards, Ayan Guha