It does support it, at least in 2.0.2 as I am running: Here one example:
val parsedLines = stream_of_logs .map(line => p.parseRecord_viaCSVParser(line)) .join(appsCateg,$"Application"===$"name","left_outer") .drop("id") .na.fill(0, Seq(“numeric_field1”,"numeric_field2")) .na.fill("", Seq( “text_field1","text_field2","text_field3”)) Notice that you have to differentiate those fields that are meant to be filled with an int, from those that require a different value, an empty string in my case. > On 11 Apr 2017, at 03:18, Sumona Routh <sumos...@gmail.com > <mailto:sumos...@gmail.com>> wrote: > > Hi there, > I have two dataframes that each have some columns which are of list type > (array<int> generated by the collect_list function actually). > > I need to outer join these two dfs, however by nature of an outer join I am > sometimes left with null values. Normally I would use df.na.fill(...), > however it appears the fill function doesn't support this data type. > > Can anyone recommend an alternative? I have also been playing around with > coalesce in a sql expression, but I'm not having any luck here either. > > Obviously, I can do a null check on the fields downstream, however it is not > in the spirit of scala to pass around nulls, so I wanted to see if I was > missing another approach first. > > Thanks, > Sumona > > I am using Spark 2.0.2 Didac Gil de la Iglesia PhD in Computer Science didacg...@gmail.com <mailto:didacg...@gmail.com> Spain: +34 696 285 544 Sweden: +46 (0)730229737 Skype: didac.gil.de.la.iglesia > On 11 Apr 2017, at 03:18, Sumona Routh <sumos...@gmail.com> wrote: > > Hi there, > I have two dataframes that each have some columns which are of list type > (array<int> generated by the collect_list function actually). > > I need to outer join these two dfs, however by nature of an outer join I am > sometimes left with null values. Normally I would use df.na.fill(...), > however it appears the fill function doesn't support this data type. > > Can anyone recommend an alternative? I have also been playing around with > coalesce in a sql expression, but I'm not having any luck here either. > > Obviously, I can do a null check on the fields downstream, however it is not > in the spirit of scala to pass around nulls, so I wanted to see if I was > missing another approach first. > > Thanks, > Sumona > > I am using Spark 2.0.2 Didac Gil de la Iglesia PhD in Computer Science didacg...@gmail.com Spain: +34 696 285 544 Sweden: +46 (0)730229737 Skype: didac.gil.de.la.iglesia
signature.asc
Description: Message signed with OpenPGP