Re: Delete NA in a dataframe

Peter Rudenko Tue, 04 Aug 2015 06:04:17 -0700

Hi Clark,

the problem is that in this dataset null values represented as NAmarker. Spark-csv doesn't have configurable null values marker (i'vemade a PR with it some time ago:https://github.com/databricks/spark-csv/pull/76).


So one option for you is to do post filtering, something like this:

val rv = allyears2k.filter("COLUMN != `NA`")

Thanks,
Peter Rudenko
On 2015-08-04 15:03, clark djilo kuissu wrote:

Hello,

I try to magage NA in this dataset. I import my dataset with thecom.databricks.spark.csv package


When I do this: allyears2k.na.drop() I have no result.

Can you help me please ?

Regards,

------------------- The dataset -------------------------

dataset: https://s3.amazonaws.com/h2o-airlines-unpacked/allyears2k.csv

-------------------   The code -------------------------

// Prepare environment
import sys.process._
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._

val allyears2k =sqlContext.read.format("com.databricks.spark.csv").option("header","true").load("/home/clark/allyears2k.csv")

allyears2k.registerTempTable("allyears2k")

val rv = allyears2k.na.drop()

Re: Delete NA in a dataframe

Reply via email to