Re: Dataframes na fill with empty list

Didac Gil Tue, 11 Apr 2017 01:02:50 -0700

It does support it, at least in 2.0.2 as I am running:

Here one example:


val parsedLines = stream_of_logs
  .map(line => p.parseRecord_viaCSVParser(line))
  .join(appsCateg,$"Application"===$"name","left_outer")
  .drop("id")
  .na.fill(0, Seq(“numeric_field1”,"numeric_field2"))
  .na.fill("", Seq(
       “text_field1","text_field2","text_field3”))

Notice that you have to differentiate those fields that are meant to be filled 
with an int, from those that require a different value, an empty string in my 
case.

> On 11 Apr 2017, at 03:18, Sumona Routh <sumos...@gmail.com 
> <mailto:sumos...@gmail.com>> wrote:
> 
> Hi there,
> I have two dataframes that each have some columns which are of list type 
> (array<int> generated by the collect_list function actually).
> 
> I need to outer join these two dfs, however by nature of an outer join I am 
> sometimes left with null values. Normally I would use df.na.fill(...), 
> however it appears the fill function doesn't support this data type.
> 
> Can anyone recommend an alternative? I have also been playing around with 
> coalesce in a sql expression, but I'm not having any luck here either.
> 
> Obviously, I can do a null check on the fields downstream, however it is not 
> in the spirit of scala to pass around nulls, so I wanted to see if I was 
> missing another approach first.
> 
> Thanks,
> Sumona
> 
> I am using Spark 2.0.2

Didac Gil de la Iglesia
PhD in Computer Science
didacg...@gmail.com <mailto:didacg...@gmail.com>
Spain:     +34 696 285 544
Sweden: +46 (0)730229737
Skype: didac.gil.de.la.iglesia
> On 11 Apr 2017, at 03:18, Sumona Routh <sumos...@gmail.com> wrote:
> 
> Hi there,
> I have two dataframes that each have some columns which are of list type 
> (array<int> generated by the collect_list function actually).
> 
> I need to outer join these two dfs, however by nature of an outer join I am 
> sometimes left with null values. Normally I would use df.na.fill(...), 
> however it appears the fill function doesn't support this data type.
> 
> Can anyone recommend an alternative? I have also been playing around with 
> coalesce in a sql expression, but I'm not having any luck here either.
> 
> Obviously, I can do a null check on the fields downstream, however it is not 
> in the spirit of scala to pass around nulls, so I wanted to see if I was 
> missing another approach first.
> 
> Thanks,
> Sumona
> 
> I am using Spark 2.0.2

Didac Gil de la Iglesia
PhD in Computer Science
didacg...@gmail.com
Spain:     +34 696 285 544
Sweden: +46 (0)730229737
Skype: didac.gil.de.la.iglesia

signature.asc
Description: Message signed with OpenPGP

Re: Dataframes na fill with empty list

Reply via email to